When, in January, armed rioters stormed the US Capitol building carrying Confederate flags and calling for Vice-President Mike Pence to be hanged, of the internet as the grim conclusion of years of algorithm-fuelled online misinformation and hate.
Platforms such as Facebook, Twitter and YouTube, , have sought to make themselves even more addictive â and even more profitable â by feeding users the most outrageous and compelling content, up to and including risible conspiracy theories such as QAnon (the idea that a satanic cabal of paedophile Hollywood actors and Democratic politicians conspired against the presidency of Donald Trump). In 2018, YouTube was already being dubbed for using its autoplay function to smoothly lead viewers down a dark rabbit hole of clips, starting with a Donald Trump rally and culminating in full-blown Holocaust denial.
Radicalised hordes of Trump supporters wearing face paint and horns might at first sight appear a world away from the academic milieu. Yet a small but growing number of scholars are sounding the alarm over the fact that the research academics see is also being determined by algorithms, through search and recommendation tools such as Google Scholar, ResearchGate and Mendeley Suggest. Of course, this may be a good thing â algorithms could make it easier to uncover hidden research gems in neglected journals and allow overwhelmed academics to sift through an ever-growing torrent of new articles. But if machines, not people, decide which articles academics read next, critics fear that this will have profound consequences for scientific consensus and discovery.
âWe can actually learn from what happened on Facebook and Twitter and other social media. Science is not immune from all of these things,â says Peter Kraker, founder of literature mapping tool Open Knowledge Maps and a former open science researcher at Graz University of Technology. âEverything that happened on Facebook can also happen with these academic social networks.â
ÌÇĐÄVlog
In a pre-digital age, academics discovered new research through conferences, tip-offs from colleagues and printed journals â either specific to their subfield or those of broader interest, such as Nature and Science. Scholars still use these channels, says Kraker. And not all digital alerts are algorithmic: some are just feeds of new articles from a particular journal. But although detailed usage data are scarce, itâs a âfair assumptionâ that almost every researcher has at least some kind of automatic alert set up, Kraker thinks.
And just as Google dominates online searches in most countries, it appears to be dominant in academic searches, too. One of usage by early career researchers found that Google Scholar and Googleâs main search engine were âuniversally popular irrespective of country, language, and disciplineâ. In the US, Google Scholar was particularly well used, with two-thirds of early career researchers saying it was their top source of scholarly information.
ÌÇĐÄVlog
Google Scholar does not only return search results, however. It also recommends new papers through its alert system. It has this in common with a number of scholarly platforms: ResearchGate, Mendeley and Semantic Scholar also offer both a way to search and a recommendation tool. And although those two functions are distinct, they are both algorithmically driven ways to find new articles.
When a researcher searches for a keyword in Google Scholar, for instance, it returns a list of papers that, by default, are sorted according to ârelevanceâ â a seemingly simple word that opens up a host of questions about how exactly it is defined. Google offers a of its approach: it weighs âthe full text of each document, where it was published, who it was written by, as well as how often and how recently it has been cited in other scholarly literatureâ. It is this last part of the equation â boosting papers that have lots of recent citations â that has a number of academics worried.
âItâs a sort of rich-get-richer effect,â says Katy Jordan, a specialist in digital scholarship at the University of Cambridge. In prioritising research that is already popular, scholarly platforms risk repeating the feedback mechanisms of social media, where a minuscule fraction of content goes viral but the vast majority is all but ignored, Jordan thinks. âTrying to recreate those dynamics [in science] risks prioritising a small proportion of things,â she says.
Academics have always been more inclined to cite previously popular articles, of course. But algorithms risk exacerbating that tendency, Jordan argues. Not only that, but given there is already bias towards citing academics who are male and from high-income countries, the use of citations to help calculate which papers to recommend carries a ârisk of compounding the inequalities that are already baked into academic publishingâ, she warns.
One  found that an increasing share of citations is accruing to older articles. It suggested that this could be because of a feedback loop generated by the appearance of these papers at the top of Google Scholar searches: an effect dubbed by the study's authors as the âfirst-page results syndromeâ.
By contrast, more traditional search tools, such as Scopus, the Web of Science and library catalogues, sort their results by default according to how recently the papers were published, with the most recent at the top.
âWe know that some papers get cited because they are highly cited, either because it is a disciplinary norm or because they are easier to find in search engines,â says Mike Thelwall, professor of data science at the University of Wolverhampton. âI think that academic search engines exacerbate this problem by making highly cited papers easier to find, but they have not created it.â
A spokeswoman for Google insists that its recommendation system âcasts a wide net to identify papers that are likely to be of interest to the researcherâ, using Google Scholarâs âcomprehensive indexingâ system (Google Scholar is by most estimates reckoned to have the worldâs biggest underlying database of academic papers â close to 400 million in ).
ÌÇĐÄVlog

Another concern about academic search engines is that some of them, Google Scholar included, violate fundamental principles of science: reproducibility and transparency.
After testing 28 academic search systems, two bibliometrics experts, Michael Gusenbauer and Neal R. Haddaway, concluded that half are unsuitable for conducting systematic reviews of the literature (this is in contrast to âlookupâ searching, when an academic needs to track down a specific paper). In a published last year in the journal Research Synthesis Methods, they single out Google Scholar in particular as âinappropriateâ as a âprincipal search systemâ. This is partly because it inexplicably returned different results at different times to the same query in the same circumstances â although the Google spokeswoman insists that âdifferent users using the same query at roughly the same time will see the same set of articlesâ.
The broader problem, say Haddaway and Gusenbauer, is a simple lack of transparency about how and why platforms recommend one paper over the next. âWe donât know how Google Scholar is providing results,â says Haddaway, a senior research fellow at the Stockholm Environment Institute. âWe donât know the ranking.â
Platforms do often provide public descriptions of the factors they take into account when recommending or ranking papers. But this isnât the same as releasing the full underlying code so that specialists can scrutinise exactly how the sausage is made, critics say.
âThereâs the methodological point that if youâre using [search engines] as an information source, you need to be crystal clear about how things have been found,â says Cambridgeâs Jordan.
And that is particularly true given that search algorithms will shape scientific discovery in a âvery fundamentalâ way, according to Björn Brembs, professor of neurogenetics at Germanyâs University of Regensburg. âAt the very minimum, the code needs to be open and verifiable,â says Brembs, who also campaigns for open access. âAnd it needs to be substitutable, so if you donât like this one, you can have an interface that allows you to replace one algorithm with another.â
In some cases, the underlying code is kept under wraps because it is a valuable commercial secret. For Connected Papers â a new, freely available but for-profit literature-mapping tool â the algorithm determining how papers are related is its âcore valueâ, says Alex Tarnavsky Eitan, a doctoral student in electrical engineering at Tel Aviv University who is one of the co-founders of a company that grew out of a âweekendâ project.
As other aspects of Connected Papers, such as the community around it and the user experience it offers, become more valuable, perhaps âweâll get to the point where we can release the algorithmâ, Tarnavsky Eitan says. But, for now, he and his co-founders donât want to âshoot ourselves in the footâ by releasing their secret sauce â although they do publish a of the algorithm on their website.
But the problem of transparency goes even deeper than commercial considerations.
Some academic search engines, such as Semantic Scholar, use a form of artificial intelligence called a neural network. Crudely put, such programmes mimic the structure of the human brain. However, even when â as in Semantic Scholarâs case â the code is open source, it is fiendishly difficult to discern why a neural network has spat out a particular answer â a problem that has spawned an entire research agenda called âexplainable AIâ.
The inability of such recommendation systems to explain their workings is âabsolutelyâ a concern, says Dan Weld, Thomas J. Cable/WRF professor of computer science and engineering at the University of Washington, who helped build Semantic Scholarâs recommendation systems. Semantic Scholarâs similarity algorithm has been published and is open source, and the platform is working on ways to explain to users why it has recommended a particular paper, he says. But the neural network computes papersâ similarity across hundreds of dimensions, making it hard for it to make plain why it made a particular connection. âThere arenât good English-language words for those dimensions,â Weld says. âBy definition, [any explanation] is going to be incomplete and inaccurate.â
ÌÇĐÄVlog

Semantic Scholar is developed by the Seattle-based not-for-profit Allen Institute for AI; Weld says one of the reasons he was attracted to work for the platform was âthe ability not to be bound by concerns of profit or loss, but to build the best tools possibleâ. But does the commercial raison dâĂȘtre of most search and recommendation tools mean that they will ultimately put profit before researchersâ best interests?
Critics argue that big publishers have a particularly acute conflict of interest because they own both search tools and the journals they recommend. Itâs as if Facebook owned a large chunk of the worldâs media â could we trust the firm, in such circumstances, to not bump up its own newspapers in its news feed?
âIf there is a monetary advantage of guiding users to their own content, then they will do so,â says Brembs. âThis doesnât take an expert to guess.â
However, to be fair to publishers, there is no evidence so far that their search tools are prioritising their own content. And a spokesman for the publisher Elsevier insists that âwhich organisation published an articleâ is not a parameter that its Mendeley reference manager takes account of in its âsuggestâ function, which recommends new research. âIt doesnât âknowâ which publisher or society published a particular article,â the spokesman says.
Whatâs more, although Google and Microsoft (the latter created the recently defunct Microsoft Academic search tool) are some of the most profitable companies on earth, their academic tools have the character of curiosity-driven side projects, observers say â they arenât yet being run to make money.
âMicrosoft and Google are mostly, I think, acting in a public spirit. I donât think Googleâs making a whole lot of money off Google Scholar,â says Semantic Scholarâs Weld.
This lack of profitability may bring its own problems, though. When Microsoft announced in May that it would shut down Microsoft Academic (and its open access underlying map of papers, on which other services are built) at the end of the year, some campaigners said this proved that academia cannot rely on the benevolence of tech giants for crucial search infrastructure.

The core reason why Facebook, Twitter and Google-owned YouTube have proved such addictive, unregulated fire hoses of distraction is that they are all competing for attention in order to bring in advertising revenue and harvest data.
Advertising undergirds the business model of some academic search engines, too: most notably ResearchGate, the recommendation platform that most closely resembles a social network. The company acknowledges that there is a short-term pressure to maximise attention to quickly boost advertising revenue. However, âoptimising for attention would in the long term be quite detrimental for usâ, says Holly Corbett, a senior data scientist at ResearchGate. âWe donât want to make people rage-clickâŠwe want to make scientists more productive.â
If users reported feeling guilty for having spent too long on the site â a feeling all too familiar to social media users worldwide â this would be a âred alertâ, indicating to the company that it needed to change how the site works, she adds.
ResearchGate does sometimes witness the equivalent of a viral, clickbait news article, Corbett admits. âOccasionally we will see floods of traffic if someone has published a Covid study, for example, that says something like âvaccination doesnât workâ: something thatâs of potentially dubious quality,â she says. But, in general, the site is engineered to avoid an âavalanche of attention on one thingâ because it recommends papers that are similar to what the user has clicked on or downloaded before, rather than less relevant but more popular papers.
ResearchGate also explicitly tries to preserve a human element to discovering new research. In addition to algorithmically recommending papers, users see what colleagues in their network have posted. The idea is that these tip-offs keep the âwild cardâ nature of discovery alive â the equivalent of flicking through a generalist paper journal. This addresses a worry that the engineers designing AI recommendation systems are : that too laser-like a focus on related papers will end the âserendipitousâ nature of scientific discovery.

There are utopian and dystopian visions of where AI-driven recommendations will lead science â but they may not be mutually exclusive.
The dream is to end the sense felt by almost all researchers of drowning in a sea of new papers. Working as a biomedical researcher 10 years ago, âmy constant experience was just being underwaterâ, says Corbett. âI used to have a printed-out stack of publications that I would have to read at some point but would never read and would always feel guilty about.â
The promise of AI-assisted recommendations is simply that âyouâll spend less time staring at a big stack of papers and feeling really badâ, she says. âThatâs my hope.â
The nightmare, by contrast, is a future in which recommender systems become clever enough to create the equivalent of political filter bubbles in science, feeding researchers only the papers that confirm their beliefs, trapping them forever in existing paradigms, leading to scientific stagnation.
It is true that science has always had rival âinvisible collegesâ and âfilter bubblesâ of scholars who reinforce each otherâs beliefs, says Cambridgeâs Jordan; this was explored as long ago as 1989 in Tony Becher and Paul R. Trowlerâs book Academic Tribes and Territories.
But as technology advances, making it increasingly possible to recommend papers that confirm usersâ beliefs rather than merely matching their existing interests, some observers worry that this blinkering tendency will be exacerbated. Scite.ai, for example, founded in 2018, uses artificial intelligence to classify whether a paper is cited in a supportive, neutral or contrasting way â in other words, allowing users to see whether an article is heavily cited because people agree or disagree with it. And it is only âa question of timeâ until the likes of Google Scholar know academicsâ views, predicts Brembs.
So will the filter bubble dystopia come to pass? âI see no reason why it shouldnât,â says Open Knowledge Mapsâ Kraker. Reinforcing an academicâs core beliefs about a research paradigm would be a powerful way of hooking them on a particular site, he thinks.
Above all, academics should âabsolutelyâ worry that the process of research could be transformed by algorithmic recommendations as radically as political discourse has been warped by Facebook, Twitter and YouTube, he urges. The phenomenon is less advanced in scientific than in social media, he concedes, but it is encroaching and the filter-bubble effect requires immediate attention.
âWe should be very careful and not assume that there is a class of humans that is immune,â he says. âSometimes I get the sense that researchers think of themselves like that.â
ÌÇĐÄVlog
POSTSCRIPT:
Print headline: Will algorithms crush scientific serendipity?
Register to continue
Why register?
- Registration is free and only takes a moment
- Once registered, you can read 3 articles a month
- Sign up for our newsletter
Subscribe
Or subscribe for unlimited access to:
- Unlimited access to news, views, insights & reviews
- Digital editions
- Digital access to °Ő±á·Ąâs university and college rankings analysis
Already registered or a current subscriber?








