A Practical Solution to the Pervasive Problems of p Values

Plain English Summary

This paper lays out a devastating case against the p-value, the workhorse statistic that scientists use to decide whether a result is 'real.' Wagenmakers shows three damning problems: p-values depend on imaginary data you never collected, they change based on what the researcher intended to do (the same data can be 'significant' or not depending on when you planned to stop collecting), and identical p-values can mean wildly different things at different sample sizes. Here's the kicker: at a commonly used threshold, the probability that the boring null hypothesis is actually true can range from 69% to a whopping 92% as your sample grows. As a fix, Wagenmakers champions the Bayesian information criterion (BIC), a straightforward alternative that approximates Bayesian reasoning without the heavy mathematical machinery. This paper later became a loaded weapon in debates over claimed evidence for psychic phenomena.

Abstract

In the field of psychology, the practice of p value null-hypothesis testing is as widespread as ever. Despite this popularity, or perhaps because of it, most psychologists are not aware of the statistical peculiarities of the p value procedure. In particular, p values are based on data that were never observed, and these hypothetical data are themselves influenced by subjective intentions. Moreover, p values do not quantify statistical evidence. This article reviews these p value problems and illustrates each problem with concrete examples. The three problems are familiar to statisticians but may be new to psychologists. A practical solution to these p value problems is to adopt a model selection perspective and use the Bayesian information criterion (BIC) for statistical inference (Raftery, 1995). The BIC provides an approximation to a Bayesian hypothesis test, does not require the specification of priors, and can be easily calculated from SPSS output.