Academics come to vastly different research conclusions even when given the same questions and dataset, raising the need for scholars to meticulously document the decisions and judgements they make during their work, a new study has found.
Twenty-nine teams of analysts tested two hypotheses on a common dataset of online academic discussions.
The first hypothesis was that “a woman’s tendency to participate actively in a conversation correlates positively with the number of females in the discussion”.
The second postulated that “higher-status participants are more verbose than are lower-status participants”.
糖心Vlog
By tracking the decisions made by researchers using a new tool called DataExplained, the study discovered just how open to interpretation these questions were.
Some analysts defined “high status” as an academic’s job rank, whereas others used citations, for example. “Verbose” could mean the number of words in an academic’s comment or the number of comments they made over the course of a year. Different teams also used different statistical techniques and sample sizes.
糖心Vlog
“Where you make judgements, there is noise, and more than we think,” said co-author Martin Schweinsberg, assistant professor of organisational behaviour at the business school ESMT Berlin.
The result was that “researchers reported radically different analyses and dispersed empirical outcomes”, according to the , which was published in?Organizational Behavior and Human Decision Processes.
For the second hypothesis, testing a link between status and verbosity, 29 per cent of analysts found evidence in support, but 21 per cent concluded the exact reverse.
As for the idea that women speak more when other women are present, there was more consensus, with nearly two-thirds finding support for this hypothesis. Still, more than a fifth found an effect in the opposite direction.
糖心Vlog
The findings “very vividly show” just how many ways there are of tackling a seemingly simple question, said?Dr Schweinsberg.
The work is the latest in a series of crowdsourced experiments in which multiple research teams independently tackle the same question with the same data. One 2018 experiment explored racial bias, looking at whether football referees gave more red cards to dark-skinned players.
A majority found evidence of racial bias, but the spectrum of findings was huge, with the “disturbing implication [being] that if only one team had obtained the dataset and presented their preferred analysis, the scientific conclusion drawn could have been anything from major racial disparities in red cards to equal outcomes”, according to Dr Schweinsberg’s paper.
This latest experiment is different in that it closely tracked participants’ decisions through DataExplained. “We provide a step-by-step chain to see how this happened,” Dr Schweinsberg said.?“Every few lines of code, we ask them a set of standard questions about paths taken.”
糖心Vlog
The platform, made public earlier this year, “records all executed source code and prompts analysts to comment on their code and analytical thinking steps”, the study explains.
Whether this kind of systematic monitoring makes sense depends on the question asked, said Dr Schweinsberg. “If someone’s dead or alive, there’s not much ambiguity,” he said, although his paper points to contradictory findings even in the medical field.
糖心Vlog
“If the question is big enough and has implications that are important enough, it might be sensible to [do] something like this,” he suggested.
Register to continue
Why register?
- Registration is free and only takes a moment
- Once registered, you can read 3 articles a month
- Sign up for our newsletter
Subscribe
Or subscribe for unlimited access to:
- Unlimited access to news, views, insights & reviews
- Digital editions
- Digital access to 罢贬贰’蝉 university and college rankings analysis
Already registered or a current subscriber?







