A history of the selection of the widely used significance level leaves much to be desired.
Statisticians and non-statisticians daily select the level of statistical significance to be used for decisions, experimental designs, data collection, sample sizes, and other formal analysis. They usually choose 5%. Why 5%? Note this definition in a well-known dictionary: "Significance level: The level of probability which it is agreed that the null hypothesis will be rejected. Conventionally set at 0.05" (1).
Lynn D. Torbeck
Thus, the Cambridge Dictionary of Statistics gives 5% as the definition of the significance level. Why 5%? Why not some other value? Is it acceptable to be wrong 5% of the time? If we choose another value, what value(s) should it be: 10%, 1%, or 0.1%?
Other writers have reflected on this as well. "Why is the 0.05 level of significance used as the decision point to reject the null hypothesis? Why not 0.06 level of significance? Actually, the 0.05 level of significance is used because of tradition. [Sir] R. A. Fisher (2), the founder of modern statistical methods, chose this value and other scientists have accepted the choice" (3).
In his landmark 1926 paper, "The Arrangement of Field Experiments," (4) Sir Ronald presented his logic for selecting confidence levels: "It will illustrate the meaning of tests of significance if we consider for how many years the [farm] produce (i.e., results) should have been recorded in order to make the evidence convincing. First, if the experimenter could say that in twenty years experience with uniform treatment the difference in favour of the acre treated with manure had never before touched 10 per cent, the evidence would have reached a point which may be called the verge of significance; ... This level, which we may call the 5 per cent. point, would be indicated, though very roughly, by the greatest chance deviation observed in twenty successive trials. ... If one in twenty does not seem high enough odds, we may, if we prefer it, draw the line at one in fifty (the 2 per cent. point), or one in a hundred (the 1 per cent. point)." Apparently, Sir Ronald was not fixated on 5%.
Fisher worked at the Rothamsted Experimental Station located in Harpenden, England, from 1919–1933 (2, 5). While there, he studied crop yields and animal husbandry using statistics and the theory of experimental designs that he developed. An incorrect decision affected only the distribution of manure on crop fields or the care and feeding of pigs and honey bees. Being wrong for 5% of the decisions wouldn't seem to be a major problem. Also, Fisher was not legalistic in his use of significance levels. Being pragmatic, he was much more concerned with the practical impact or practical significance of the results. It should be noted that Fisher developed many of the valuable statistical tables used for significance testing, and he chose levels of 0.05 and 0.01%. Thus, in effect, he forced the rest of the world to go along with his choices regardless of the application.
Other statisticians have pointed out that the selected level of 5% determines how often we will be wrong in our decisions. "In rejecting the null hypothesis, the sampler faces the possibility that he is wrong. Such is the risk always run by those who test hypotheses and rest decisions on the tests. ... As a matter of practical convenience, probability levels of 5% (0.05) and 1% (0.01) are commonly used in deciding whether to reject the null hypothesis. ... This use of 5% and 1% levels is simply a working convention. There is merit in the practice, followed by some investigators, of reporting in parentheses the probability ... (6)."
"The question arises: at what probability level does a deviation become statistically significant? There is no rational probability level at which possibility ceases and impossibility begins, but it is conventional to regard a probability of 0.05 as the critical level of significance" (7).
Thus, the hallowed 5% significance level was born from a crop experiment and a manure spreader. I am sure this gives reassurance to those benefitting from the next analysis.
Lynn D. Torbeck is a statistician at Torbeck and Assoc., 2000 Dempster Plaza, Evanston, IL 60202, tel. 847.424.1314, Lynn@Torbeck.org, www.torbeck.org.
1. B.S. Everitt, The Cambridge Dictionary of Statistics (Cambridge University Press, Cambridge, MA, 1998) p. 305.
2. J.F. Box, and R.A. Fisher, The Life of a Scientist (John Wiley & Sons, New York, NY, 1978).
3. J.F. Zolman, Biostatistics, Experimental Design and Statistical Inference (Oxford University Press, Oxford, 1993) p. 84.
4. R.A. Fisher, Jrnl of the Ministry of Agric., 33, 504 (1926).
5. J.L. Folks, Ideas of Statistics (John Wiley & Sons, New York, NY, 1981) p. 245.
6. G.W. Snedecor, and W. G. Cochran, Statistical Methods, 6th ed., (Iowa State Press, Ames, IA, 1971) p. 27
7. L.H.C. Tippett, The Methods of Statistics, 4th ed., (John Wiley & Sons, New York, NY, 1951) p. 89.