Statlint — example report
4 statistical impossibilities found
100
Risk score / 100
4
Impossibilities
2
Suspicions
16
Checks run

Impossibilities — proven

inconsistent GRIMMER Anxiety (Table 2)
SD 0.82 is impossible for mean 2.50, N = 4 (integer data).
No integer sum of squares with the correct parity reproduces this standard deviation for the reported mean and N. For integer data the sum of squares must be an integer sharing the parity of the sum, and no such value rounds to the reported SD.
Reported
mean: 2.50
sd: 0.82
n: 4
Anaya, J. (2016). The GRIMMER test: A method for testing the validity of reported measures of variability. PeerJ Preprints 4:e2400v1.
Brown, N. J. L., & Heathers, J. A. J. (2017). The GRIM test. SPPS, 8(4).
inconsistent GRIM Mood (Table 2)
Mean 3.43 is impossible for N = 20.
With N = 20 integer observations the achievable means nearest to 3.43 are 3.40 and 3.45; none rounds to the reported value. This usually indicates a typo, a misreported N, or fabricated data.
Reported
mean: 3.43
n: 20
items: 1
Computed
nearest_achievable: ['3.40', '3.45']
Brown, N. J. L., & Heathers, J. A. J. (2017). The GRIM test: A simple technique detects numerous anomalies in the reporting of results in psychology. Social Psychological and Personality Science, 8(4), 363-369.
inconsistent GRIMMER Stress (Table 2)
SD 3.50 is impossible for mean 4.00, N = 20 (integer data).
No integer sum of squares with the correct parity reproduces this standard deviation for the reported mean and N. For integer data the sum of squares must be an integer sharing the parity of the sum, and no such value rounds to the reported SD.
Reported
mean: 4.00
sd: 3.50
n: 20
Anaya, J. (2016). The GRIMMER test: A method for testing the validity of reported measures of variability. PeerJ Preprints 4:e2400v1.
Brown, N. J. L., & Heathers, J. A. J. (2017). The GRIM test. SPPS, 8(4).
inconsistent SPRITE Stress (Table 2)
SD 3.50 is impossible for mean 4.00 on a 1-7 scale (max possible ≈ 3.078).
For values confined to [min, max] with the given mean, the variance cannot exceed (mean-min)·(max-mean); the reported SD exceeds the largest value this allows.
Reported
mean: 4.00
sd: 3.50
n: 20
scale: [1, 7]
Computed
max_possible_sd: 3.077935056255462
Heathers, J. A. J., Anaya, J., van der Zee, T., & Brown, N. J. L. (2018). Recovering data from summary statistics: Sample Parameter Reconstruction via Iterative TEchniques (SPRITE). PeerJ Preprints 6:e26968v1.

Suspicions — heuristic, unproven

suspicious descriptive Primary outcome (t-test)
Reported t=0.80 does not match the value implied by the group means/SDs/Ns (Student t = -3.422, Welch t = -3.422).
The reported statistic differs from an independent-samples t-test / one-way ANOVA recomputed from the descriptives. This can be innocent — an adjusted model (ANCOVA, paired test, covariates) would differ legitimately — so treat it as a prompt to check the analysis, not as proof of error.
Reported
groups: [{'name': 'control', 'mean': 10.0, 'sd': 2.0, 'n': 30}, {'name': 'treatment', 'mean': 12.0, 'sd': 2.5, 'n': 30}]
reported_stat: t=0.80
Computed
student_t: -3.421595691073206
student_p: 0.0011469620922456205
welch_t: -3.421595691073206
welch_p: 0.001177199356816115
Brown, N. J. L., & Heathers, J. A. J. (2017). The GRIM test. SPPS, 8(4).
suspicious Carlisle-baseline
Baseline arms are implausibly similar (Carlisle p = 6e-10 across 5 variables).
Under proper randomisation the baseline comparison p-values should be uniform; here they are concentrated near 1, which Carlisle (2017) associated with fabricated or non-random data. This is a heuristic — stratified randomisation and correlated variables can mimic it.
Reported
n_variables: 5
fisher_stat: 0.075
p_too_similar: 5.997882349441545e-10
p_too_different: 0.9999999994002118
Carlisle, J. B. (2017). Data fabrication and other reasons for non-random sampling in 5087 randomised, controlled trials in anaesthetic and general medical journals. Anaesthesia, 72(8), 944-952.
10 consistent / not-applicable checks
consistent GRIM Anxiety (Table 2)
Mean 2.50 is GRIM-consistent for N = 4.
Reported
mean: 2.50
n: 4
items: 1
Computed
achievable_value: 2.5
Brown, N. J. L., & Heathers, J. A. J. (2017). The GRIM test: A simple technique detects numerous anomalies in the reporting of results in psychology. Social Psychological and Personality Science, 8(4), 363-369.
consistent SPRITE Anxiety (Table 2)
SD 0.82 is within the range possible for a 1-7 scale (max ≈ 3.003).
Reported
mean: 2.50
sd: 0.82
n: 4
scale: [1, 7]
Computed
max_possible_sd: 3.0033259341381293
Heathers, J. A. J., Anaya, J., van der Zee, T., & Brown, N. J. L. (2018). Recovering data from summary statistics: Sample Parameter Reconstruction via Iterative TEchniques (SPRITE). PeerJ Preprints 6:e26968v1.
consistent SPRITE Mood (Table 2)
SD 1.20 is within the range possible for a 1-7 scale (max ≈ 3.023).
Reported
mean: 3.43
sd: 1.20
n: 20
scale: [1, 7]
Computed
max_possible_sd: 3.0228559169660802
Heathers, J. A. J., Anaya, J., van der Zee, T., & Brown, N. J. L. (2018). Recovering data from summary statistics: Sample Parameter Reconstruction via Iterative TEchniques (SPRITE). PeerJ Preprints 6:e26968v1.
consistent GRIM Stress (Table 2)
Mean 4.00 is GRIM-consistent for N = 20.
Reported
mean: 4.00
n: 20
items: 1
Computed
achievable_value: 4
Brown, N. J. L., & Heathers, J. A. J. (2017). The GRIM test: A simple technique detects numerous anomalies in the reporting of results in psychology. Social Psychological and Personality Science, 8(4), 363-369.
consistent GRIM Sleep quality (Table 2)
Mean 5.10 is GRIM-consistent for N = 50.
Reported
mean: 5.10
n: 50
items: 1
Computed
achievable_value: 5.1
Brown, N. J. L., & Heathers, J. A. J. (2017). The GRIM test: A simple technique detects numerous anomalies in the reporting of results in psychology. Social Psychological and Personality Science, 8(4), 363-369.
consistent GRIMMER Sleep quality (Table 2)
SD 1.30 is GRIMMER-consistent with mean 5.10, N = 50.
Reported
mean: 5.10
sd: 1.30
n: 50
Anaya, J. (2016). The GRIMMER test: A method for testing the validity of reported measures of variability. PeerJ Preprints 4:e2400v1.
Brown, N. J. L., & Heathers, J. A. J. (2017). The GRIM test. SPPS, 8(4).
consistent SPRITE Sleep quality (Table 2)
SD 1.30 is within the range possible for a 1-7 scale (max ≈ 2.821).
Reported
mean: 5.10
sd: 1.30
n: 50
scale: [1, 7]
Computed
max_possible_sd: 2.821378842238059
Heathers, J. A. J., Anaya, J., van der Zee, T., & Brown, N. J. L. (2018). Recovering data from summary statistics: Sample Parameter Reconstruction via Iterative TEchniques (SPRITE). PeerJ Preprints 6:e26968v1.
consistent Benford
Leading-digit distribution is consistent with Benford's law (n = 32).
Reported
n: 32
chi2: 7.653
p: 0.46806821499676565
mad: 0.0433
Nigrini, M. J. (2012). Benford's Law: Applications for Forensic Accounting, Auditing, and Fraud Detection. Wiley.
consistent terminal-digits
Final-digit distribution looks uniform (n = 32).
Reported
n: 32
chi2: 21.125
p: 0.012106819396338144
Nigrini, M. J. (2012). Benford's Law: Applications for Forensic Accounting, Auditing, and Fraud Detection. Wiley.
not applicable GRIMMER Mood (Table 2)
GRIMMER not applied: the reported mean is itself GRIM-impossible (see the GRIM finding).
Reported
mean: 3.43
sd: 1.20
n: 20
Anaya, J. (2016). The GRIMMER test: A method for testing the validity of reported measures of variability. PeerJ Preprints 4:e2400v1.
Brown, N. J. L., & Heathers, J. A. J. (2017). The GRIM test. SPPS, 8(4).
How to read this report. An impossibility is a mathematical proof that the reported numbers cannot co-exist (given the stated assumptions, such as the data being integer). A suspicion is a heuristic signal that proves nothing by itself. Every finding is a question to ask the authors — most often the cause is a typo or a misreported sample size, not misconduct.