The Limitations of p-Values and the Case for Bayesian Hypothesis Testing

By Ranran Li · November 2019

Reviewed by: Shiying Wu & Chuanpeng Hu

I. What is a p-value and why is it problematic?

The p-value is defined as: "Given that the null hypothesis (H0) is true, the probability of observing results as extreme or more extreme than the data we observed."

Introduced by British geneticist and statistician Ronald Fisher in the 1930s, the p-value was meant to serve as a reference point to determine whether a result is significant. Fisher recommended a significance threshold of α = 0.05 (approximately 2 standard deviations from the mean of a normal distribution). If the p-value is small, then either a rare event occurred under H0 or the null hypothesis should be rejected in favor of an alternative. This method is known as Null Hypothesis Significance Testing (NHST).

However, increasing numbers of researchers have questioned the concept of "statistical significance" and the limitations of p-values. Four major issues include:

II. NHST vs. Bayesian Hypothesis Testing: Why is Bayesian inference gaining traction?

Bayesian hypothesis testing differs in several key ways:

III. Will Bayesian inference replace p-values?

Bayesian logic is increasingly adopted, especially in light of the replication crisis. But will it replace p-values?

I believe the two methods reflect fundamentally different logics. NHST estimates the probability of data given H0; Bayes compares the probability of data under multiple hypotheses. Thus, Bayes factors are not replacements but complements to p-values, helping improve interpretability.

Despite advantages, Bayesian methods have caveats:

In 2018, 72 scholars proposed redefining statistical significance from p < 0.05 to p < 0.005 to reduce false positives (Benjamin et al., 2018).

IV. My Opinion

If you have time and interest in methodology, I recommend learning Bayesian inference. Evaluate it only after you're sufficiently exposed.

When reporting results, it can be helpful to supplement p-values with Bayes factors. This is now easy thanks to JASP (a free, open-source statistical software for both Bayesian and frequentist analysis).

Since most researchers still rely on NHST, I suggest being cautious when interpreting near-threshold p-values, especially around the .05 boundary.

V. Learning Resources for Bayesian Analysis in JASP

References

← Back to Blog Overview