SD 0.82 is impossible for mean 2.50, N = 4 (integer data).
No integer sum of squares with the correct parity reproduces this standard deviation for the reported mean and N. For integer data the sum of squares must be an integer sharing the parity of the sum, and no such value rounds to the reported SD.
Reported
mean: 2.50
sd: 0.82
n: 4
Anaya, J. (2016). The GRIMMER test: A method for testing the validity of reported measures of variability. PeerJ Preprints 4:e2400v1. Brown, N. J. L., & Heathers, J. A. J. (2017). The GRIM test. SPPS, 8(4).
inconsistentGRIMMood (Table 2)
Mean 3.43 is impossible for N = 20.
With N = 20 integer observations the achievable means nearest to 3.43 are 3.40 and 3.45; none rounds to the reported value. This usually indicates a typo, a misreported N, or fabricated data.
Reported
mean: 3.43
n: 20
items: 1
Computed
nearest_achievable: ['3.40', '3.45']
Brown, N. J. L., & Heathers, J. A. J. (2017). The GRIM test: A simple technique detects numerous anomalies in the reporting of results in psychology. Social Psychological and Personality Science, 8(4), 363-369.
inconsistentGRIMMERStress (Table 2)
SD 3.50 is impossible for mean 4.00, N = 20 (integer data).
No integer sum of squares with the correct parity reproduces this standard deviation for the reported mean and N. For integer data the sum of squares must be an integer sharing the parity of the sum, and no such value rounds to the reported SD.
Reported
mean: 4.00
sd: 3.50
n: 20
Anaya, J. (2016). The GRIMMER test: A method for testing the validity of reported measures of variability. PeerJ Preprints 4:e2400v1. Brown, N. J. L., & Heathers, J. A. J. (2017). The GRIM test. SPPS, 8(4).
inconsistentSPRITEStress (Table 2)
SD 3.50 is impossible for mean 4.00 on a 1-7 scale (max possible ≈ 3.078).
For values confined to [min, max] with the given mean, the variance cannot exceed (mean-min)·(max-mean); the reported SD exceeds the largest value this allows.
Reported
mean: 4.00
sd: 3.50
n: 20
scale: [1, 7]
Computed
max_possible_sd: 3.077935056255462
Heathers, J. A. J., Anaya, J., van der Zee, T., & Brown, N. J. L. (2018). Recovering data from summary statistics: Sample Parameter Reconstruction via Iterative TEchniques (SPRITE). PeerJ Preprints 6:e26968v1.
Suspicions — heuristic, unproven
suspiciousdescriptivePrimary outcome (t-test)
Reported t=0.80 does not match the value implied by the group means/SDs/Ns (Student t = -3.422, Welch t = -3.422).
The reported statistic differs from an independent-samples t-test / one-way ANOVA recomputed from the descriptives. This can be innocent — an adjusted model (ANCOVA, paired test, covariates) would differ legitimately — so treat it as a prompt to check the analysis, not as proof of error.
Brown, N. J. L., & Heathers, J. A. J. (2017). The GRIM test. SPPS, 8(4).
suspiciousCarlisle-baseline
Baseline arms are implausibly similar (Carlisle p = 6e-10 across 5 variables).
Under proper randomisation the baseline comparison p-values should be uniform; here they are concentrated near 1, which Carlisle (2017) associated with fabricated or non-random data. This is a heuristic — stratified randomisation and correlated variables can mimic it.
Reported
n_variables: 5
fisher_stat: 0.075
p_too_similar: 5.997882349441545e-10
p_too_different: 0.9999999994002118
Carlisle, J. B. (2017). Data fabrication and other reasons for non-random sampling in 5087 randomised, controlled trials in anaesthetic and general medical journals. Anaesthesia, 72(8), 944-952.
10 consistent / not-applicable checks
consistentGRIMAnxiety (Table 2)
Mean 2.50 is GRIM-consistent for N = 4.
Reported
mean: 2.50
n: 4
items: 1
Computed
achievable_value: 2.5
Brown, N. J. L., & Heathers, J. A. J. (2017). The GRIM test: A simple technique detects numerous anomalies in the reporting of results in psychology. Social Psychological and Personality Science, 8(4), 363-369.
consistentSPRITEAnxiety (Table 2)
SD 0.82 is within the range possible for a 1-7 scale (max ≈ 3.003).
Reported
mean: 2.50
sd: 0.82
n: 4
scale: [1, 7]
Computed
max_possible_sd: 3.0033259341381293
Heathers, J. A. J., Anaya, J., van der Zee, T., & Brown, N. J. L. (2018). Recovering data from summary statistics: Sample Parameter Reconstruction via Iterative TEchniques (SPRITE). PeerJ Preprints 6:e26968v1.
consistentSPRITEMood (Table 2)
SD 1.20 is within the range possible for a 1-7 scale (max ≈ 3.023).
Reported
mean: 3.43
sd: 1.20
n: 20
scale: [1, 7]
Computed
max_possible_sd: 3.0228559169660802
Heathers, J. A. J., Anaya, J., van der Zee, T., & Brown, N. J. L. (2018). Recovering data from summary statistics: Sample Parameter Reconstruction via Iterative TEchniques (SPRITE). PeerJ Preprints 6:e26968v1.
consistentGRIMStress (Table 2)
Mean 4.00 is GRIM-consistent for N = 20.
Reported
mean: 4.00
n: 20
items: 1
Computed
achievable_value: 4
Brown, N. J. L., & Heathers, J. A. J. (2017). The GRIM test: A simple technique detects numerous anomalies in the reporting of results in psychology. Social Psychological and Personality Science, 8(4), 363-369.
consistentGRIMSleep quality (Table 2)
Mean 5.10 is GRIM-consistent for N = 50.
Reported
mean: 5.10
n: 50
items: 1
Computed
achievable_value: 5.1
Brown, N. J. L., & Heathers, J. A. J. (2017). The GRIM test: A simple technique detects numerous anomalies in the reporting of results in psychology. Social Psychological and Personality Science, 8(4), 363-369.
consistentGRIMMERSleep quality (Table 2)
SD 1.30 is GRIMMER-consistent with mean 5.10, N = 50.
Reported
mean: 5.10
sd: 1.30
n: 50
Anaya, J. (2016). The GRIMMER test: A method for testing the validity of reported measures of variability. PeerJ Preprints 4:e2400v1. Brown, N. J. L., & Heathers, J. A. J. (2017). The GRIM test. SPPS, 8(4).
consistentSPRITESleep quality (Table 2)
SD 1.30 is within the range possible for a 1-7 scale (max ≈ 2.821).
Reported
mean: 5.10
sd: 1.30
n: 50
scale: [1, 7]
Computed
max_possible_sd: 2.821378842238059
Heathers, J. A. J., Anaya, J., van der Zee, T., & Brown, N. J. L. (2018). Recovering data from summary statistics: Sample Parameter Reconstruction via Iterative TEchniques (SPRITE). PeerJ Preprints 6:e26968v1.
consistentBenford
Leading-digit distribution is consistent with Benford's law (n = 32).
Reported
n: 32
chi2: 7.653
p: 0.46806821499676565
mad: 0.0433
Nigrini, M. J. (2012). Benford's Law: Applications for Forensic Accounting, Auditing, and Fraud Detection. Wiley.
consistentterminal-digits
Final-digit distribution looks uniform (n = 32).
Reported
n: 32
chi2: 21.125
p: 0.012106819396338144
Nigrini, M. J. (2012). Benford's Law: Applications for Forensic Accounting, Auditing, and Fraud Detection. Wiley.
not applicableGRIMMERMood (Table 2)
GRIMMER not applied: the reported mean is itself GRIM-impossible (see the GRIM finding).
Reported
mean: 3.43
sd: 1.20
n: 20
Anaya, J. (2016). The GRIMMER test: A method for testing the validity of reported measures of variability. PeerJ Preprints 4:e2400v1. Brown, N. J. L., & Heathers, J. A. J. (2017). The GRIM test. SPPS, 8(4).
How to read this report. An impossibility is a
mathematical proof that the reported numbers cannot co-exist (given the stated
assumptions, such as the data being integer). A suspicion is a
heuristic signal that proves nothing by itself. Every finding is a question to
ask the authors — most often the cause is a typo or a misreported sample size,
not misconduct.