Problems with p
Holmes and Peirce
(these slides available from www.peirce.org.uk/talks/p-hack )
Holmes and Peirce
(these slides available from www.peirce.org.uk/talks/p-hack )
- Perceived problems
- Proper problem:
- p-hacking: Popular topics
- p-hacking: exPerimenter degrees of freedom
- p-hacking: Potential solutions
This is hopefully going to be a discussion. Nick and I don’t necessarily have the answers!
- Will it make any difference?
- Are your data so close to p=0.05 that the tests give different answers?
- Many (even well-educated scientists) don’t know understand p
- Nick will cover this later
- Then again, have you ever come across a really good example where it matters?!
I don’t think either of those problems have led to large-scale errors in reported findings
Most reported positive results are false alarms (Horton 2015 ; Ioannidis, 2005 ; Harris 2017 )
- We know that we should correct for “Family-wise” error
- What defines a “Family” in this case?
- Do all studies have the same Family-wise error?
One ‘is allowed’ to apply statistical tests in exploratory research, just as long as one realizes that they do not have evidential impact. De Groot (1956), translated by Wagenmakers et al (2014)
p only makes senses when there was a single way to analyse the data, that you decided on beforehand.
“This cost us a lot of time and our own money to collect. There’s got to be something here we can salvage because it’s a cool (rich & unique) data set.” Brian Wansink, The grad student that never said ‘no’
- Postdocs everywhere: labs with lots of staff
- shall we rerun the study?
- what if two studies have disagreeing results?
- then again, are we saying that a study can’t be repeated?
For discussion’s sake (I’m not saying I agree with these)
Maybe Bayesian statistics get us out of this (e.g. see Wagenmakers)
Advantages:
We could upload all our null results to PsychFileDrawer.org
- That would get rid of the imbalance between positive/null results
- Will it, though? Will people read the null findings in Psych File Drawer?
We could encourage and enable replications to be conducted
- make it possible to publish a replication (and a non-replication)
- need to make sure that failed replications were well-run
- how do we encourage this pursuit?
e.g. pre-register on Open Science Framework
If the core problem is that too many studies are false positives, maybe we should reduce alpha to be 0.005…?