糖心Vlog

The REF鈥檚 star system leaves a black hole in fairness

With such wide disagreements in grading, the research excellence framework鈥檚 gravity for careers is unjustifiable, says Philip Moriarty

Published on
June 27, 2019
Last updated
June 27, 2019
Illustration of stars on wall
Source: Michael Parkin

鈥淚n your field of study, Professor Aspire, just how does one distinguish a 3* from a 4*聽paper in the research excellence framework?鈥

The interviewee for a senior position at the University of True Excellence 鈥 names have been changed to protect the guilty 鈥 shuffled in his seat. I聽leaned slightly forward after posing the question, keen to hear his response to this perennial puzzler that has exercised some of the UK鈥檚 great and not-so-great academic minds.

He coughed. The panel 鈥 on which I聽was the external reviewer 鈥 waited expectantly.

鈥淲ell, a 4* paper is a 3*聽paper except that your mate is one of the REF panel members,鈥 he answered.

糖心Vlog

ADVERTISEMENT

I smiled and suppressed a聽giggle.

Other members of the panel were less amused. After all, the rating and ranking of academics鈥 outputs is serious stuff. Careers 鈥 indeed, the viability of entire departments, schools, institutes and universities 鈥 depend critically on the judgements made by peers on the REF panels.

Not only do the ratings directly influence the intangible benefits arising from the prestige of a high REF ranking, they also translate into cold, hard cash. An by the University of Sheffield suggests that in my subject area, physics, the average annual value of a 3*聽paper for REF聽2021 is likely to be roughly 拢4,300, whereas that of a 4*聽paper is 拢17,100. In聽other words, the formula for allocating 鈥渜uality-related鈥 research funding is such that a paper deemed 4* is worth four times one judged to be 3*; as for 2* (鈥渋nternationally recognised鈥) or 1* (鈥渘ationally recognised鈥) papers, they are literally worthless.

糖心Vlog

ADVERTISEMENT

We might have hoped that before divvying up more than 拢1聽billion of public funds a聽year, the objectivity, reliability and robustness of the ranking process would be established beyond question. But, without wanting to cast any aspersions on the integrity of REF panels, I鈥檝e got to admit that, from where I聽was sitting, Professor Aspire鈥檚 tongue-in-cheek answer regarding the difference between 3* and 4*聽papers seemed about as good as any 鈥 apart from, perhaps, 鈥淚聽don鈥檛 know鈥.

The solution certainly isn鈥檛 to reach for simplistic bibliometric numerology such as impact factors or ; anyone making that suggestion is not displaying even the level of critical thinking we expect of our undergraduates. But every academic also knows, deep in their studious soul, that peer review is far from wholly objective. Nevertheless, university senior managers 鈥 many of them practising or former academics themselves 鈥 are often all too willing, as part of their REF preparations, to credulously accept internal assessors鈥 star ratings at face value, with sometimes worrying consequences for the researcher in question (especially if the verdict is 2* or less).

Fortunately, my institution, the University of Nottingham, is a little more enlightened聽鈥 last year聽it had the good sense to check the consistency of the internal verdicts on potential REF 2021 submissions via the use of independent reviewers for each paper. The results were sobering. Across seven scientific units of assessment, the level of full agreement between reviewers varied from 50聽per cent to 75聽per cent. In other words, in the worst cases, reviewers agreed on the star rating for no more than half of the papers they reviewed.

Granted, the vast majority of the disagreement was at the 1*聽level; very few pairs of reviewers were 鈥渙ut鈥 by two stars, and none disagreed by more. But this is cold comfort. The REF鈥檚 credibility is based on an assumption that reviewers can quantitatively assess the quality of a paper with a precision better than one star. As our exercise shows, the effective error bar is actually 卤鈥1*.

糖心Vlog

ADVERTISEMENT

That would be worrying enough if there were a linear scaling of financial reward. But the problem is exacerbated dramatically by聽both the 4x multiplier for 4*聽papers and the total lack of financial reward for anything deemed to be below聽3*.

The Nottingham analysis also examined the extent to which reviewers鈥 ratings agreed with authors鈥 self-scoring (let鈥檚 leave aside any disagreement between co-authors on that). The level of full agreement here was similarly patchy, varying between 47聽per cent and 71聽per cent. Unsurprisingly, there was an overall tendency for authors to 鈥渙verscore鈥 their papers, although underscoring was also common.

Some argue that what鈥檚 important is the aggregate REF score for a department, rather than the ratings of individual papers, because, according to the , any wayward ratings will 鈥渨ash out鈥 at the macro level. I聽disagree entirely. Individual academics across the UK continue to be coaxed and cajoled into producing 4* papers; there are even dedicated funding schemes to help them do so. And the repercussions arising from failure can be severe.

It is vital in any game of consequence that participants be able to agree when a goal has been scored or a boundary hit. Yet, in the case of research quality, there are far too many cases in which we just can鈥檛. So the question must be asked: why are we still playing?

糖心Vlog

ADVERTISEMENT

Philip Moriarty is professor of physics at the University of Nottingham.

POSTSCRIPT:

Print headline:聽The REF鈥檚 star system creates a black hole into which fairness falls

Register to continue

Why register?

  • Registration is free and only takes a moment
  • Once registered, you can read 3 articles a month
  • Sign up for our newsletter
Please
or
to read this article.

Related articles

Reader's comments (4)

The answer would appear to have more independent peer-reviewing, assessing work with an external eye with no skin in the game. Obviously this has implications for time and resource management, though if you want a level playing field beyond reproach what other alternative do you have?
REF should be abolished and a different way of funding university research should be found. There is no evidence that it has improved the quality of research or genertaed more research with an enduring impact. THe ancient methodf of distribution of funds by a University Grants Committee was certainly no worse and would liberate the universities from a wasteful bureaucracy and scholars from a philistine exercise in counting.
In my opinion, a fairer way is to have more independent experts from other nations in panels.
"The solution certainly isn鈥檛 to reach for simplistic bibliometric numerology such as impact factors or SNIP indicators; " Why not? It would be no less flawed than the current system and free up around 拢0.25 billion.

Sponsored

Featured jobs

See all jobs
ADVERTISEMENT