Statlint report — Statlint

Statlint Statlint — example report

4 statistical impossibilities found

100

Risk / 100

Impossible

Review

Checks run

Impossibilities — proven

inconsistent GRIMMER Anxiety (Table 2)

SD 0.82 is impossible for mean 2.50, N = 4 (integer data).

No integer sum of squares with the correct parity reproduces this standard deviation for the reported mean and N. For integer data the sum of squares must be an integer sharing the parity of the sum, and no such value rounds to the reported SD.

Reported

mean: 2.50sd: 0.82n: 4

Anaya, J. (2016). The GRIMMER test: A method for testing the validity of reported measures of variability. PeerJ Preprints 4:e2400v1.
Brown, N. J. L., & Heathers, J. A. J. (2017). The GRIM test. SPPS, 8(4).

inconsistent GRIM Mood (Table 2)

Mean 3.43 is impossible for N = 20.

With N = 20 integer observations the achievable means nearest to 3.43 are 3.40 and 3.45; none rounds to the reported value. This usually indicates a typo, a misreported N, or fabricated data.

Reported

mean: 3.43n: 20items: 1

Computed

nearest_achievable: ['3.40', '3.45']

Brown, N. J. L., & Heathers, J. A. J. (2017). The GRIM test: A simple technique detects numerous anomalies in the reporting of results in psychology. Social Psychological and Personality Science, 8(4), 363-369.

inconsistent GRIMMER Stress (Table 2)

SD 3.50 is impossible for mean 4.00, N = 20 (integer data).

Reported

mean: 4.00sd: 3.50n: 20

inconsistent SPRITE Stress (Table 2)

SD 3.50 is impossible for mean 4.00 on a 1-7 scale (max possible ≈ 3.078).

For values confined to [min, max] with the given mean, the variance cannot exceed (mean-min)·(max-mean); the reported SD exceeds the largest value this allows.

Reported

mean: 4.00sd: 3.50n: 20scale: [1, 7]

Computed

max_possible_sd: 3.077935056255462

Heathers, J. A. J., Anaya, J., van der Zee, T., & Brown, N. J. L. (2018). Recovering data from summary statistics: Sample Parameter Reconstruction via Iterative TEchniques (SPRITE). PeerJ Preprints 6:e26968v1.

Suspicions — heuristic, unproven

suspicious descriptive Primary outcome (t-test)

Reported t=0.80 does not match the value implied by the group means/SDs/Ns (Student t = -3.422, Welch t = -3.422).

The reported statistic differs from an independent-samples t-test / one-way ANOVA recomputed from the descriptives. This can be innocent — an adjusted model (ANCOVA, paired test, covariates) would differ legitimately — so treat it as a prompt to check the analysis, not as proof of error.

Reported

groups: [{'name': 'control', 'mean': 10.0, 'sd': 2.0, 'n': 30}, {'name': 'treatment', 'mean': 12.0, 'sd': 2.5, 'n': 30}]reported_stat: t=0.80

Computed

student_t: -3.421595691073206student_p: 0.0011469620922456205welch_t: -3.421595691073206welch_p: 0.001177199356816115

Brown, N. J. L., & Heathers, J. A. J. (2017). The GRIM test. SPPS, 8(4).

suspicious Carlisle-baseline

Baseline arms are implausibly similar (Carlisle p = 6e-10 across 5 variables).

Under proper randomisation the baseline comparison p-values should be uniform; here they are concentrated near 1, which Carlisle (2017) associated with fabricated or non-random data. This is a heuristic — stratified randomisation and correlated variables can mimic it.

Reported

n_variables: 5fisher_stat: 0.075p_too_similar: 5.997882349441545e-10p_too_different: 0.9999999994002118

Carlisle, J. B. (2017). Data fabrication and other reasons for non-random sampling in 5087 randomised, controlled trials in anaesthetic and general medical journals. Anaesthesia, 72(8), 944-952.

10 consistent / not-applicable checks

consistent GRIM Anxiety (Table 2)

Mean 2.50 is GRIM-consistent for N = 4.

Reported

mean: 2.50n: 4items: 1

Computed

achievable_value: 2.5

consistent SPRITE Anxiety (Table 2)

SD 0.82 is within the range possible for a 1-7 scale (max ≈ 3.003).

Reported

mean: 2.50sd: 0.82n: 4scale: [1, 7]

Computed

max_possible_sd: 3.0033259341381293

consistent SPRITE Mood (Table 2)

SD 1.20 is within the range possible for a 1-7 scale (max ≈ 3.023).

Reported

mean: 3.43sd: 1.20n: 20scale: [1, 7]

Computed

max_possible_sd: 3.0228559169660802

consistent GRIM Stress (Table 2)

Mean 4.00 is GRIM-consistent for N = 20.

Reported

mean: 4.00n: 20items: 1

Computed

achievable_value: 4

consistent GRIM Sleep quality (Table 2)

Mean 5.10 is GRIM-consistent for N = 50.

Reported

mean: 5.10n: 50items: 1

Computed

achievable_value: 5.1

consistent GRIMMER Sleep quality (Table 2)

SD 1.30 is GRIMMER-consistent with mean 5.10, N = 50.

Reported

mean: 5.10sd: 1.30n: 50

consistent SPRITE Sleep quality (Table 2)

SD 1.30 is within the range possible for a 1-7 scale (max ≈ 2.821).

Reported

mean: 5.10sd: 1.30n: 50scale: [1, 7]

Computed

max_possible_sd: 2.821378842238059

consistent Benford

Leading-digit distribution is consistent with Benford's law (n = 32).

Reported

n: 32chi2: 7.653p: 0.46806821499676565mad: 0.0433

Nigrini, M. J. (2012). Benford's Law: Applications for Forensic Accounting, Auditing, and Fraud Detection. Wiley.

consistent terminal-digits

Final-digit distribution looks uniform (n = 32).

Reported

n: 32chi2: 21.125p: 0.012106819396338144

Nigrini, M. J. (2012). Benford's Law: Applications for Forensic Accounting, Auditing, and Fraud Detection. Wiley.

not applicable GRIMMER Mood (Table 2)

GRIMMER not applied: the reported mean is itself GRIM-impossible (see the GRIM finding).

Reported

mean: 3.43sd: 1.20n: 20

How to read this report. An impossibility is a mathematical proof that the reported numbers cannot co-exist (given the stated assumptions, such as the data being integer). A suspicion is a heuristic signal that proves nothing by itself. Every finding is a question to ask the authors — most often the cause is a typo or a misreported sample size, not misconduct.