Love DORA, Hate Rankings?

This piece first appeared on the LSE Impact Blog on 10 May 2021.

Lizzie Gadd argues that any commitment to responsible research assessment as outlined in DORA (Declaration on Research Assessment) and other such manifestos needs to include action on global university rankings. Highlighting four fundamental critiques of the way in which journal metrics and university rankings have been deployed in higher education, she proposes universities could unite around the principle of being ‘much more than their rank’.

More and more institutions are signing up to responsible metrics manifestos such as DORA – which is great. This is no doubt influenced by funder demands that they do so – which is also great. And these manifestos are having a positive impact on researcher-level evaluation – which is triply great. But, as we all know, researcher-level evaluation issues, such as avoiding Journal Impact Factors, are only one element of the sector’s research evaluation problems.

UKRI Chief Executive Ottoline Leyser recently pointed out that any evaluation further up the food-chain in the form of university- or country-level evaluations ultimately has an impact on individual researchers. And of course the most influential of these, at the top of the research evaluation food-chain, are the global university rankings.

So why, I often ask myself, do we laud universities for taking a responsible approach to journal metrics and turn a blind eye to their participation in, and celebration of, the global rankings?

Indeed, when you look at the characteristics of Journal Impact Factors (JIFs) and the characteristics of global university rankings, they both fall foul of exactly the same four critiques.

1. The construction problem

As DORA states, there are significant issues with the calculation of the JIF: the average cites per paper for a journal over two years. Firstly, providing the mean cites-per-paper of a skewed dataset is not statistically sensible. Secondly, whilst the numerator includes all citations to the journal, the denominator excludes ‘non-citable items’ such as editorials and letters – even if they have been cited. Thirdly, the time window of two years is arguably not long enough to capture citation activity in less citation dense fields, as a result you can’t compare a JIF in one field with that from another.

However, global university rankings are subject to even harsher criticisms about their construction. The indicators they use are a poor proxy for the concept they seek to evaluate (the use of staff:student ratios as a proxy for teaching quality for example). The concepts they seek to evaluate are not representative of the work of all universities (societal impacts are not captured at all). The data sources they use are heavily biased towards the global north. They often use sloppy reputation-based opinion polls. And worst of all, they combine indicators together using arbitrary weightings, a slight change in which can have a significant impact on a university’s rank.

2. The validity problem

Construction issues aside, problems with the JIF really began when it was repurposed from an indicator to decide which journals should appear in Garfield’s citation index, to one used by libraries to inform collection development, and then by researchers to choose where to publish and finally by readers (and others) to decide which research was the best for being published there. It had become an invalid proxy for quality, rather than as a means of ensuring the most citations were captured by a citation index.

Whilst the JIF may have inadvertently found itself in this position, some of the global rankings quite deliberately over-state their meaning. Indeed, each of the ‘big three’ global rankings (ARWU, QS and THE WUR) claim to reveal which are the ‘top’ universities (despite using different methods for reaching their different conclusions). However, given the many and varied forms of higher education institutions on the planet, none of these high-profile rankings articulates exactly what their ‘top’ universities are supposed to be top at. The truth is that the ‘top’ universities are mainly top at being old, large, wealthy, English-speaking, research-focussed and based in the global north.

3. The application problem

Of course, once we have indicators that are an invalid proxy for the thing they claim to measure (JIFs signifying ’quality’ and rankings signifying ‘excellence’) third parties will make poor use of them for decision-making. Thus, funders and institutions started to judge researchers based on the number of outputs they had in high-JIF journals, as though that somehow reflected on the quality of their research and of them as a researcher.

In a similar way, we know that some of the biggest users of the global university rankings are students seeking to choose where to study (even though no global ranking provides any reliable indication of teaching quality) because who doesn’t want to study at a ‘top’ university? But it’s not just students; institutions and employers are also known to judge applicants based on the rank of their alma mater. Government-funded studentship schemes will also often only support attendance at top 200 institutions.

4. The impact problem

Ultimately, these issues have huge impacts on both individual careers and the scholarly enterprise. The problems associated with the pursuit of publication in high-JIF journals have been well-documented and include higher APC costs, publication delays, publication of only positive findings on hot topics, high retraction rates, and negative impacts on the transition to open research practices.

The problems associated with the pursuit of a high university ranking are less well-documented but are equally, if not more, concerning. At individual level, students can be denied the opportunity to study at their institution of choice and career prospects can be hampered through conscious or unconscious ranking-based bias. At institution level, ranking obsession can lead to draconian hiring, firing and reward practices based on publication indicators. At system level we see increasing numbers of countries investing in ‘world-class university’ initiatives that concentrate resource in a few institutions whilst starving the rest. There is a growing inequity both within and between countries’ higher education offerings that should seriously concern us all.

What to do?

If we agree that global university rankings are an equally problematic form of irresponsible research evaluation as the Journal Impact Factor, we have to ask ourselves why their usage and promotion does not form an explicit requirement of responsible metrics manifestos. An easy answer is that universities are the ’victim’ not the perpetrator of the rankings. However, universities are equally complicit in providing data to, and promoting the outcomes of, global rankings. The real answer is that the rankings are so heavily used by those outside of universities that not to participate would amount to financial and reputational suicide.

rankings are so heavily used by those outside of universities that not to participate would amount to financial and reputational suicide

Despite this, universities do have both the power and the responsibility to take action on global university rankings that would be entirely in keeping with any claim to practice responsible metrics. This could involve:

  1. Avoiding setting KPIs based on the current composite global university rankings.
  2. Avoiding promoting a university’s ranking outcome.
  3. Avoiding legitimising global rankings by hosting, attending, or speaking at, ranking-promoting summits and conferences.
  4. Rescinding membership of ranking-based ‘clubs’ such as the World 100 Reputation Academy.
  5. Working together with other global universities to redefine university quality (or more accurately, qualities) and to develop better ways of evaluating these.

recently argued that university associations might develop a ‘Much more than our rank’ campaign. This would serve all universities equally – from those yet to get a foothold on the current rankings, to those at the top. Every university has more to offer than is currently measured by the global university rankings – something that I’m sure even the ranking agencies would admit.  Such declarations would move universities from judged to judge, from competitor to collaborator. It would give them the opportunity to redefine and celebrate the diverse characteristics of a thriving university beyond the rankings’ narrow and substandard notions of ‘excellence’.

The time has come for us to extend our definition of responsible metrics to include action with regards to the global university rankings. I’m not oblivious to the challenges, and I am certainly open to dialogue about what this might look like.  But, we shouldn’t continue to turn a blind eye to the poor construction, validity, application and impact of global rankings, whilst claiming to support and practice responsible metrics. We have to start somewhere, and we have to do it together, but we need to be  brave enough to engage in this conversation.

The author is very grateful to Stephen Curry for feedback on the first draft of this blogpost.

The challenge of measuring open research data

This post was originally published on The Bibliomagician Blog on 24 March 2021

Lizzie Gadd & Gareth Cole discuss the practical challenges of monitoring progress towards institutional open research data ambitions.

Loughborough University has recently introduced a new Open Research Position Statement which sets out some clear ambitions for open access, open data and open methods. As part of this work we’re looking at how we can monitor our progress against those ambitions. Of course, for open access, we’re all over it. We have to count how many of our outputs are available openly in accordance with various funder policies anyway. But there are no equivalent demands for data. OK, all data resulting from some funded projects need to be made available openly, but no-one’s really counting – yet. And anyway, our ambitions go beyond that – we’d like to encourage all data to be made available openly and ‘FAIRly’ where possible.

So how do we measure that? Well, with difficulty it would seem. And here’s why: 


In the world of journal articles, although there are disciplinary differences as to the number of articles that are produced, every article is roughly equivalent in size and significance to another. Data are not like that. What qualifies as a single unit of data, thus receiving its own metadata record, might be a photograph or a five-terabyte dataset. So it would be a bit unfair to compare the volume of these. And there is currently no agreement as to ‘how much data’ (in size, effort, or complexity) there needs to be to qualify for a unique identifier.

What qualifies as a single unit of data, thus receiving its own metadata record, might be a photograph or a five-terabyte dataset.

But it’s not just what counts as a unit of data but what counts as a citable unit of data that differs. A deposit of twenty files could have one DOI/Identifier or twenty DOIs depending on how it is split up. This means that potentially there could be citation advantages or disadvantages for those that deposit their data in aggregate or individually – but this would entirely depend on how the citer chooses to cite it.


For journal articles, full-text versions are duplicated all over the place. The same article might be available on multiple repositories, pre-print servers and the publisher’s site. In fact, whilst there are concerns about version control, there are many benefits to such duplicates in terms of discovery and archiving (Lots of Copies Keeps Stuff Safe [LOCKSS] and all that). But for data, it’s not good practice to duplicate in different repositories. This is both for versioning reasons (if you update in one place, you then have to update in others) and for DOI reasons (two instances usually means two DOIs which means that any data citations you get will be split across two sources). 

So if we wanted to identify all the data produced by Loughborough academics, we’d have a pretty difficult job doing it. Some will be on our repository, but other data will be spread across multiple different archives. Data citation and linking services such as Datacite and Scholix may ultimately offer a solution here of course, but as others have noted, these have a long way to go before they are truly useful for monitoring purposes. Datacite only indexes data with a Datacite DOI. And Scholix only surfaces links between articles and datasets, not the independent existence of data.  Some services, such as Dimensions, only index records that have an item type of “Dataset”. This means that data taking other forms such as “media” or “figure” won’t appear in Dimensions, thus disenfranchising those researchers who use “data” but not “datasets”.

But the biggest problem these services face is that they rely on consistent high-quality metadata collection, curation and sharing by all the different data repositories.

But the biggest problem these services face is that they rely on consistent high-quality metadata collection, curation and sharing by all the different data repositories. And we’re just not there yet. (Although all repositories which mint DataCite DOIs will need to comply with the minimum/mandatory requirements to mint the DOI in the first place). And a particular problem for institutions seeking to measure their own dispersed data output is that few repositories expose author affiliation data even where they do comprehensively collect it. And this leads us on to our third point.


The authorship of journal articles is increasingly controlled and subject to much guidance. Learned societies provide guidance, journals provide guidance, institutions sometimes have their own guidance. The CRediT taxonomy (whilst not without problems) was introduced to make it absolutely explicit as to who did what on a journal article. The same is not usually true of data.

Of course, data is created rather than authored as the DataCite schema makes clear. But there is no way of ensuring that all the data creators have been added to the metadata record, even if the depositors do always know who they are. And whilst there is no real glory associated with data ownership, this problem isn’t going to be quickly resolved. As with journal articles, often the list of contributors is likely to be very long so there needs to be some incentive to do this carefully and well.

And whilst there is no real glory associated with data ownership, this problem isn’t going to be quickly resolved.

This is where we butt up against Professor James Frew‘s two laws of metadata:

  1. Scientists don’t write metadata;
  2. Any scientist can be forced to write bad metadata.

There seems to be scope for some CRediT-type contributor guidance for data to ensure all the right people get the right credit. (Maybe a Research Data Alliance Working Group?) And then there needs to be some motivation for depositors to stick to it.

Quality assurance

Although the standard of journal peer review is variable and hotly contested as a mechanism to signify quality, at least all journal articles are subject to some form of it prior to publication. Data are not currently peer reviewed (unless submitted as a data paper or if the dataset is provided as supplementary information to a journal submission). And although data can be cited, this appears to be still comparatively quite rare. This may partly be due to the challenge of ‘counting’ data citations due to huge variations in citation quality, whether data is cited at all (or added as part of a data availability statement), and disciplinary differences in the way this is done. And there is a big difference between a data citation which just states ‘this data is available and relevant to my study’ and a data citation which signifies that ‘the data in question has actually been re-analysed or repurposed in my study’. But a data citation doesn’t currently differentiate between the two.

Of course, the accepted standard for data quality is the FAIR principles: data should be Findable, Accessible, Interoperable and Reusable. But despite many investigations (see: , and into the best way of assessing how FAIR data is, the average institutional data repository has no easy way of quickly identifying this.

There is also the challenge that FAIR data may not be open data, and vice versa. Some data can never be open data for confidentiality reasons. So in our attempt to pursue open research practices, and given a choice, what do we count? Open data that may not be FAIR? FAIR data that may not be open? Or only data that are both? And if so, how fair is that?


So where does this leave us at Loughborough? Well, in a less than satisfactory situation to be honest. We could look at the number of data deposits (or depositors) in our Research Repository per School over time to give us an idea of growth. But this will only give us a very partial picture. We could do a similar count of research projects on ResearchFish with associated data, or e-theses with related data records, but again, this will only give us a small window onto our research data activity. Going forward we might look at engagement with the data management planning tool, DMP Online, over time, but again this is likely to shine more light on disciplines that have to provide DMPs as part of funding applications and PhD studies. 

So, whilst we can encourage individuals to deposit data, and require narrative descriptions of their engagement with this important practice for annual appraisals, promotions, and recruitment, we have no meaningful way of monitoring this engagement at department or University-level. And as for benchmarking our engagement with that happening elsewhere, this currently feels like it’s a very long way off.

The big fear of course, is that this is where commercial players rock up and offer to do this all for us – for a price. (Already happening). And given that data is a research output not currently controlled by such outfits, it would be a very great shame to have to pay someone else to tell us about the activities of our own staff. In our view.

Really hoping that some clever, community-minded data management folk are able to help with this.

Elsevier have endorsed the Leiden Manifesto: so what?

This piece was originally posted to The Bibliomagician blog on 22 September 2020

Lizzie Gadd speculates as to why Elsevier endorsed the Leiden Manifesto rather than signing DORA, and what the implications might be.

If an organisation wants to make a public commitment to responsible research evaluation they have three main options: i) sign DORA, ii) endorse the Leiden Manifesto (LM), or iii) go bespoke – usually with a statement based on DORA, the LM, or the Metric Tide principles.

The LIS-Bibliometrics annual responsible metrics survey shows that research-performing organisations adopt a wide range of responses to this including sometimes signing DORA and adopting the LM. But when it comes to publishers and metric vendors, they tend to go for DORA. Signing DORA is a proactive, public statement and there is an open, independent record of your commitment. DORA also has an active Chair in Professor Stephen Curry, and a small staff in the form of a program director and community manager, all of whom will publicly endorse your signing which leads to good PR for the organisation.

A public endorsement of the LM leads to no such fanfare. Indeed, the LM feels rather abandoned by comparison. Despite a website and blog, there has been little active promotion of the Manifesto, nor any public recognition for anyone seeking to endorse it. Indeed one can’t help wondering how differently the LM would operate if it had been born in a UK institution subject to the impact-driven strictures of the REF?

But despite this, Elsevier recently announced that they had chosen the Leiden Manifesto over DORA. Which leads us to ask i) why? And ii) what will this mean for their publishing and analytics business?

Why not DORA?

Obviously I wasn’t party to the conversations that led to this decision and can only speculate. But for what it’s worth, my speculation goes a bit like this:

So, unlike the LM which provides ten principles to which all adopters should adhere, DORA makes different demands of different stakeholders. So research institutions get off pretty lightly with just two requirements: i) don’t use journals as proxies for the quality of papers, and ii) be transparent about your reward criteria. Publishers and metrics suppliers, however, are subject to longer lists (see box) and of course, Elsevier are both.  And it is within these lists of requirements that I think we find our answers.

This image has an empty alt attribute; its file name is image.png
Box: Excerpt from DORA Principles
  1. Positioning CiteScore as the JIF’s responsible twin.

Firstly, DORA demands that publishers ‘greatly reduce emphasis on JIF as a promotional tool’. However, Elsevier have invested heavily in CiteScore (their alternative to the JIF) and are not likely to want to reduce emphasis on it. Indeed the press release announcing their endorsement of the LM provided as an example, the way they’d recently tweaked the calculation of CiteScore to ensure it met some of the LM principles, positioning it as a ‘responsible metric’ if you will. This is something they’d struggle to get away with under DORA.

  1. Open citations? Over my dead body

One of the less well-discussed requirements of DORA for publishers is to “remove all reuse limitations on reference lists in research articles and make them available under the Creative Commons Public Domain Dedication.” In other words, DORA expects publishers to engage with open citations. This is something Elsevier have infamously failed to do.

  1. Open data? You’ll have to catch me first

And finally, DORA expects metric suppliers to not only “be open and transparent by providing data and methods used to calculate all metrics” (which they partly do for subscribers) but to “Provide the data under a licence that allows unrestricted reuse, and provide computational access to data, where possible” (which they don’t).

So whereas DORA is a relatively easy sign for HEIs (only two requirements) for publishers, it’s more tricky than might first appear (five requirements) and for an organisation like Elsevier which also supplies metrics, they have to contend with a further four requirements, which would essentially eat away at their profits. And we all know that they’re only just scraping by, bless them.

The impact of endorsing the Leiden Manifesto

But isn’t it good enough that they’ve endorsed the Leiden Manifesto? After all, it’s a comprehensive set of ten principles for the responsible use of bibliometrics? Well, being a seasoned grumbler about some of the less savoury aspects of Elsevier’s SciVal, I decided to take to the discussion lists to see whether they saw this move as a the beginning or the end of their responsible metrics journey. Was this the start of a huge housekeeping exercise which would sweep away the h-index from researcher profiles? Disinfect the unstable Field-Weighted Citation Index from author rankings? And provide health-warnings around some of the other over-promising and under-delivering indicators?

Apparently not.

There is nothing inherently wrong with the h-index” said Holly Falk-Krzesinski, Elsevier’s Vice-President for Research Intelligence, pointing to three of the Leiden Manifesto’s principles where she felt it passed muster. (Despite on the same day, Elsevier’s Andrew Plume questioning its validity).  And as part of a basket of metrics, she considers the FWCI is a perfectly usable indicator for researchers. (Something Elsevier’s own SciVal Advisors disagree with). And she believes the h-index is “not displayed in any special or prominent way” on Pure Researcher Profiles. Erm…

This image has an empty alt attribute; its file name is leiden-elsevier-h-index-profile-example.jpg

And after several rounds of this, frankly, I gave up. And spent a weekend comfort-eating Kettle chips. Because I care deeply about this. And, honestly, it felt like to Elsevier it was just another game to be played. 

Responsible is as responsible does

Back in 2018 I made the point that if we weren’t careful, responsible metrics statements could, in an ironic turn, easily become ‘bad metrics’, falsely signifying a responsible approach to metrics that wasn’t there in practice. And the reason these statements are so vulnerable to this is that neither DORA nor the LM are formally policed. Anyone can claim to be a follower and the worst that can happen is that someone calls out your hypocrisy on Twitter. Which does happen. And is sometimes even effective.

It is for this reason that the Wellcome Trust have stated that adopting a set of responsible metrics principles is not enough. If you want to receive their research funding from 2021, you need to demonstrate that you are acting on your principles. Which is fair. After all, if you want Athen Swan accreditation, or Race Equality Chartership or a Stonewall Charter, you have to provide evidence and apply for it. It’s not self-service. You can’t just pronounce yourself a winner. And I can’t help wondering: yes, Elsevier has endorsed the Leiden Manifesto, but would the Leiden Manifesto (given the chance) endorse Elsevier? 

Now I know that CWTS and DORA would run a mile from such a proposition, but that doesn’t mean it’s not needed.  Responsible-metrics-washing is rife. And whilst I‘d rather folks washed with responsible metrics than anything else – and I’m sure a few good things will come out of it – it does rather feel like yet another instance of a commercial organisation paying lip-service to a community agenda for their own ends (see also: open access and copyright retention).

Right on cue, Helen Lewis in The Atlantic recently described the ”self-preservation instinct [that] operates when private companies struggle to acclimatize to life in a world where many consumers vocally support social-justice causes”. “Progressive values are now a powerful branding tool” she writes, and “Brands will gravitate toward low-cost, high-noise signals as a substitute for genuine reform, to ensure their survival.” Correct me if I’m wrong but that sounds pretty apposite?

Of course, it’s early days for Elsevier’s Leiden Manifesto journey and Andrew Plume did seek to reassure me in a video call that they were still working through all the implications. So let’s hope I’m worrying about nothing and we’ll be waving goodbye to the h-index in Elsevier products any day soon.  But if nothing does transpire, I know as the developer of a responsible metrics model myself, that I’d feel pretty sick about it being used as empty virtue-signalling. And it does occur to me that funders’ seeking to hold institutions to account for their responsible research evaluation practices might do well to direct their attention to the publishers they fund.

Otherwise I fear it really will be case of, well, Elsevier have endorsed the Leiden Manifesto: so what?

Rethinking the Rankings

This piece was originally posted to the ARMA blog on 14 October 2020.

Lizzie Gadd and Richard Holmes share the initial findings of the INORMS Research Evaluation Working Group’s efforts to rate the World University Rankings.

When the INORMS Research Evaluation Working Group (REWG) was formed in 2016, Lizzie asked the representatives of twelve international research management societies where they felt we should focus our attention if we wanted to achieve our aim of making research evaluation more meaningful, responsible and effective. They were unanimous: the world university rankings. Although research managers are not always the ones in their institutions that deal with the world university rankers, they are one of the groups that feel their effect most keenly: exclusion from certain funding sources based on ranking position; requests to reverse engineer various indicators to understand their scores, and calls to introduce policies that may lead to better ranking outcomes. And all whilst fully appreciating how problematic rankings are in terms of their methodology, their validity and their significance.

So what could be done? Well, it was clear that one of the key issues with the world ranking bodies is that they are unappointed and they answer to nobody. In an earlier blog post where Lizzie describes the research evaluation environment as a food chain, she put them at the top: predators on which no-one predates. (Although some Scandinavian colleagues see them more as parasites that feed off the healthy organisms: taking but not giving back). And of course the way to topple an apex predator, is to introduce a new one: to make them answerable to the communities they rank.  So this is what the INORMS REWG set about doing, by seeking to introduce an evaluation mechanism of their own to rate the rankers.

In some parallel work, the REWG were developing SCOPE, a five-step process for evaluating effectively, so we were keen to follow our own guidance when designing our ranker ratings. And this is how we did so:

Start with what you value

Our first step was to identify what it was we wanted from any mechanism seeking to draw comparisons between universities. What did we value? To this end we sought out wisdom from all those who’ve gone ahead of us in this space: the Berlin Principles on Ranking HEIs, the work of Ellen Hazelkorn, the CWTS principles for responsible use of rankings, the Leiden ManifestoDORAYves Gingras, and many others. From their thoughts we synthesised a draft list of Criteria for Fair and Responsible University Rankings and put them out to the community for comment. We got feedback from a wide range of organisations: universities, academics, publishers and ranking organisations themselves. The feedback was then synthesised into our value document – what we valued about the entity (rankers) under evaluation. These fell into four categories: good governance, transparency, measure what matters, and rigour.

Context considerations

There are lots of reasons we evaluate things. What we’re trying to achieve here is a comparison of the various ranking organisations, with the ultimate purpose of incentivising them to do better. We want to expose where they differ from each other but also to highlight areas that the community cares about where they currently fall short.  What we didn’t want to do is create another ranking. It would have been very tempting to do so: “ranking the rankings” has a certain ring to it.  But not only would this mean that a ranking organisation got to shout about its league-table-topping status – something we didn’t want to endorse – but we wouldn’t be practising what we preached: a firm belief that it is not possible to place multi-faceted entities on a single scale labelled ‘Top’ and ‘Bottom’.

Options for evaluating

Once we had our list of values, we then set about translating them into measurable criteria – into indicators that were a good proxy for the quality being measured. As anyone who’s ever developed an evaluation approach will know, this is hard. But again, we sought to adhere to our own best practice by providing a matrix by which evaluators could provide both quantitative and qualitative feedback. Quantitative feedback took the form of a simple three-point scale according to whether the ranker fully (2 marks), partially (1 mark) or failed (0 marks) to meet the set criteria. Qualitative feedback took the form of free-text comments.  To ensure transparency and mitigate against bias as best we could, we asked a variety of international experts to each assess one of six ranking organisations against the criteria. INORMS REWG members also undertook evaluations, and, in line with the SCOPE principle of ‘evaluating with the evaluated,’ each ranker was also invited to self-assess themselves.  Only one ranking organisation, CWTS Leiden, accepted our offer to self-assess and they provided free-text comments rather than scores.  All this feedback was then forwarded to our senior expert reviewer, Dr Richard Holmes, author of the University Ranking Watch blog, and certainly one of the most knowledgeable University Rankings experts in the world. He was able to combine the feedback from our international experts with his own, often inside, knowledge of the rankings, to enable a really robust, expert assessment.

Probe deeply

Of course all good evaluations should probe their approach, which is something we sought to do during the design stage, but something we also came back to post-evaluation. We observed some criteria where rankings might be disadvantaged for good practice – for example, where a ranking did not use surveys and so could not score. This led us to introducing ‘Not Applicable’ categories to ensure they would not be penalised. One or two questions were also multi-part which made it difficult to assess fairly across the rankers. In any future iteration of the approach we would seek to correct this. We noted that the ‘partially meets’ category is also very broad, ranging from a touch short of perfect to a smidge better than fail. In future, a more granular five- or even ten-point grading system might provide a clearer picture as to where a ranking succeeds and where it needs to improve.  In short, there were some learning points. But that’s normal. And we think the results provide a really important proof-of-concept for evaluating the world rankings.


So what did we find? Well we applied our approach to six of the largest and most influential world university rankings: ARWUTHE WRQS, U-MultirankCWTS Leiden and US News & World Report. A full report will be forthcoming and the data showing the expert assessments and senior expert calibrations are available. A spidergram of the quantitative element is given in Figure 1 and some headline findings are provided below.

Machine generated alternative text:
Good gcwernance 
Measure what matters 
— ARM,' U 
T ranspare no,' 

Figure 1. Spidergram illustrating the actual scores/total possible score for each world ranker. The full data along with the important qualitative data is available.

Good governance

The five key expectations of rankers here were that they engaged with the ranked, were self-improving, declared conflicts of interest, were open to correction and dealt with gaming. In the main all ranking organisations made some efforts towards good governance, with clear weaknesses in terms of declaring conflicts of interest: no ranker really did so, even though selling access to their data and consultancy services was commonplace. 


The five expectations of rankers here were that they had transparent aims, methods, data sources, open data and financial transparency.  Once again there were some strengths when it came to the transparency of the rankers’ aims and methods – even if arguably the methods didn’t always meet the aims. The weaknesses here were around the ability of a third-party to replicate the results (only ARWU achieved full marks here), data availability, and financial transparency (where only U-Multirank achieved full marks).

Measure what matters

The five expectations of rankers here were that they drove good behaviour, measured against mission, measured one thing at a time (no composite indicators), tailored results to different audiences and gave no unfair advantage to universities with particular characteristics. Not surprisingly, this is where most rankings fell down. CWTS Leiden and U-Multirank scored top marks in terms of efforts to drive appropriate use of rankings and measuring only one thing at a time, the others barely scored.  Similarly, Leiden & U-Multirank fared quite well on measuring against mission, unlike the others. But no ranking truly tailored their offer to different audiences, assuming that all users – students, funders, universities, would value the different characteristics of universities in the same way.  And neither could any whole-heartedly say that they offered no unfair advantage to certain groups.


The one thing university rankings are most criticised for is their methodological invalidity, and so it may come as no surprise that this was another weak section for most world rankers. Here we were looking for rigorous methods, no ‘sloppy’ surveys, validity, sensitivity and honesty about uncertainty. The ranker that did the best here by a country mile was CWTS Leiden, with perfect scores for avoiding the use of opinion surveys (joined by ARWU), good indicator validity (joined by U-Multirank), indicator sensitivity, and the use of error bars to indicate uncertainty. All other rankers scored their lowest in this section.


So there is clearly work to be done here, and we hope that our rating clearly highlights what needs to be done and by whom. And in case any ranking organisation seeks to celebrate their relative ‘success’ here, it’s worth pointing out that a score of 100% on each indicator is what the community would deem to be acceptable. Anything less leaves something to be desired.

One of the criticisms we anticipate is that our expectations are too high. How can we expect rankings to offer no unfair advantage? And how can we expect commercial organisations to draw attention to their conflicts of interest? Our answer would be that just because something is difficult to achieve, doesn’t mean we shouldn’t aspire to it. Some of the sustainable development goals (no poverty, zero hunger) are highly ambitious, but also highly desirable. The beauty of taking a value-led approach, such as that promoted by SCOPE, is that we are driven by what we truly care about, rather than by the art of the possible, or the size of our dataset. If it’s not possible to rank fairly, in accordance with principles developed by the communities being ranked, we would argue that it is the rankings that need to change, not the principles. 

We hope this work initiates some reflection on the part of world university ranking organisations. But we also hope it leads to some reflection by those organisations that set so much store by the world rankings: the universities that seek uncritically to climb them; the students and academics that blindly rely on them to decide where to study or work; and the funding organisations that use them as short-cuts to identify quality applicants. This work provides qualitative and quantitative evidence that the world rankings cannot, currently, be relied on for these things. There is no fair, responsible and meaningful university ranking. Not really. Not yet. There are just pockets of good practice that we can perhaps build on if there is the will.  Let’s hope there is.

Gadding about…*

*Virtually of course.

Courtesy of the pestilence currently scourging our planet, I’ve been able to accept four opportunities to speak this Autumn, as I will be doing so from the comfort of my own home office. For anyone interested in tuning in, I’ve provided the details here and will update this with more intel as I have it.

22-Sep-20 08.30 BST: Finnish Ministry of Education & Culture

Bibliometrics: Diversity’s friend or foe? Assessing research performance using bibliometrics alone does not help create a diverse research ecosystem. But can bibliometrics ever be used to support diversity? And if not, how else can we evaluate what we value about research?

07-Oct-20 17.00 BST: NIH Bibliometrics & Research Evaluation Symposium

The Five Habits of Highly-Effective Bibliometric Practitioners Drawing on ten years’ experience supporting bibliometric and research evaluation practitioner communities, this presentation will highlight five habits of highly effective practitioners providing practical hints and tips for those seeking to support their own communities with robust research evaluation.

15-Oct-20 08.15 BST: 25th Nordic Workshop on Bibliometrics and Research Policy


The Research Evaluation Food Chain and how to fix it. Poor research evaluation practices are the root of many problems in the research ecosystem and there is a need to introduce change across the whole of the ‘food chain’. This talk will consider the challenge of lobbying for change to research evaluation activities that are outside your jurisdiction – such as senior managers and rankings (introducing the work of INORMS REWG), vendors and ‘freemium’ citation-based services.

20-Oct-20 15.00 BST: Virginia Tech Open Access Week

Counting What Counts In Recruitment, Promotion & Tenure. What we reward through recruitment, promotion and tenure processes is not always what we actually value about research activity. This talk will explore how we can pursue value-led evaluations – and how we can persuade senior leaders of their benefits.

AI-based citation evaluation tools: good, bad or ugly?

AI-based citation evaluation tools: good, bad or ugly?

This piece was originally posted on The Bibliomagician on 23 July 2020.

Lizzie Gadd gets all fancy talking about algorithms, machine learning and artificial intelligence. And how tools using these technologies to make evaluative judgements about publications are making her nervous.

A couple of weeks ago, The Bibliomagician posted an interesting piece by Josh Nicholson introducing scite. scite is a new Artificial Intelligence (AI) enabled tool that seeks to go beyond citation counting to citation assessment, recognising that it’s not necessarily the number of citations that is meaningful, but whether they support or dispute the paper they cite.

scite is one of a range of new citation-based discovery and evaluation tools on the market. Some, like Citation Gecko, Connected Papers and CoCites, use the citation network in creative ways to help identify papers that might not appear in your results list through simple keyword matching. They use techniques like co-citation (where two papers appear together in the same reference list) or bibliographic coupling (where two papers cite the same paper) as indicators of similarity. This enables them to provide “if you like this you might also like that” type services.

Other tools, like scite and Semantic Scholar, go one step further and employ technologies like Natural Language Processing (NLP), Machine Learning (ML) and Artificial Intelligence (AI) to start making judgements about the papers they index. In Semantic Scholar’s case it seeks to identfy where a paper is ‘influential’ and in scite’s case, where citations are ‘supporting’ or ‘disputing’.

And this is where I start to twitch.


I mean, there is an obvious need to understand the nuance of the citation network more fully. The main criticism of citation-based evaluation has always been that citations are wrongly treated as always a good thing. In fact, the Citation Typing Ontology lists 43 different types of citation (including my favourite, ‘is-ridiculed-by’). Although the fact that the majority are positive (<0.6% of citations are negative by scite’s calculations) itself may indicate a skewing of the scholarly record. Why cite work you don’t rate, knowing it will lead to additional glory for that paper? So if we can use new technologies to provide more insight into the nature of citation, this is a positive thing. If it’s reliable. And this is where I have questions. And although I’ve dug into this a bit, I freely admit that some of my questions might be borne of ignorance. So feel free to use the comments box liberally to supplement my thinking.

A bit about the technologies

All search engines use algorithms (sets of human encoded instructions) to return the results that match our search terms. Some, like Google Scholar, will use the citedness of papers as one element of its algorithm to sort the results in an order that may give you a better chance of finding the paper you’re looking for.  And we already know that this is problematic in that it compounds the Matthew Effect: the more cited a paper is, the more likely it will surface in your search results, thereby increasing its chances of getting read and further cited. And of course, the use of more complex citationnetwork analysis for information discovery can contribute to the same problem: by definition the less cited works are going to be less well-connected and thus returned less often by the algorithm.

Even their developers might not ever really understand what characteristics the AI is identifying in the data as ultimately contributing to the desired outcome.

Image by Gordon Johnson from Pixabay

But it’s the use of natural language processing (NLP) to ‘read’ the full text of papers and artificial intelligence or machine learning to find patterns in the data that concerns me more. So whereas historically humans might provide a long list of instructions to tell computers how to identify an influential paper, ML works by providing a shed load of examples of what an influential paper might look like, and leaving the AI to learn for itself. When the AI gets it right, it gets rewarded (reinforcement learning) and so it goes on to achieve greater levels of accuracy and sophistication. So much so, that even their developers might not ever really understand what characteristics the AI is identifying in the data as ultimately contributing to the desired outcome.

Can you see why am I twitching?


Shaky foundations

The obvious problem is that the assumptions we draw from these data are inherently limited by the quality of the data themselves. So we know that the literature is already hugely biased towards positive studies over null and negative results and towards journal-based STEM over monograph-based AHSS. So the literature is, in this way, already a biased sample of the scholarship it seeks to represent.

We also know that within the scholarship the literature does represent, all scholars are not represented equally. We know that women are less well cited than men, that they self-cite less and are less well-connected. We know the scholarship of the Global South is under-represented, as is scholarship in languages other than English. And whilst a tool may be able to accurately identify positive and negative citations, it can’t (of course) assess whether those positive and negative citations were justified in the first place.

But of course these tools aren’t just indexing the metadata but the full text. So the question I have here is whether Natural Language Processing works equally well on language that isn’t ’natural’ – i.e., where it’s the second language of the author? And what about cultural differences in the language of scholarship, where religious or cultural beliefs make expressions of confidence in the results less certain, less self-aggrandising.  And I’ll bet you a pound that there are disciplinary differences in the way that papers are described when being cited.

So we know that scholarship isn’t fully represented by the literature. The literature isn’t fully representative of the scholars. The scholars don’t all write in the same way. And of course, some of these tools are only based on a subset of the literature anyway.

At best, this seems unreliable, at worst, discriminatory?

Who makes the rules?

Of course, you may well argue that this is a problem we already face with bibliometrics, as recently asserted by Robyn Price.  I guess my particular challenge with some of these tools is that they go beyond simply making data and their inter-relationships available for human interpretation, to actually making explicit value judgements about those data themselves. And that’s where I think things start getting sticky because someone has to decide what that value (known as the target variable) looks like. And it’s not always clear who is doing it, and how.

If you think about it, being the one who gets to declare what an influential paper looks like, or what a disruptive citation looks like, is quite a powerful position. Oh not right now maybe, when these services are in start-up and some products are in Beta. But eventually, if they get to be used for evaluative purposes, you might end up with the power over someone’s career trajectory. And what qualifies them to make these decisions? Who appointed them? Who do they answer to? Are they representative of the communities they evaluate? And what leverage do the community have over their decisions?

If you think about it, being the one who gets to declare what an influential paper looks like, or what a disruptive citation looks like, is quite a powerful position.

When I queried scite’s CEO,  Josh Nicholson, about all this, he confirmed that a) folks were already challenging their definitions of supportive and disruptive citations; b) these challenges were currently being arbitrated by just two individuals; and c) they currently had no independent body (e.g. an ethics committee) overseeing their decision-making – although they were open to this. 

And this is where I find myself unexpectedly getting anxious about the birth of free/mium type services based on open citations/text that we’ve all been calling for. Because at least if a commercial product is bad, no-one need buy it, and if you do, as a paying customer you have some* leverage. But I’m not sure if the community will have the same leverage over open products, because, well, they’re free aren’t they? You take them or leave them. And because they’re free, someone, somewhere, will take them.  (Think Google Scholar).

*Admittedly not a lot in my experience.

Are the rules right?

Of course, it’s not just who defines our target variable but how they do it, that matters. What exactly are these algorithms being trained to look for when they seek out ’influential’, ‘supportive’ or ’disruptive’ citations? And does the end user know that? More pertinently, does the developer know that? Because by definition, AI is trained by examples of what is being sought, rather than by human-written rules around how to find it. (There are some alarming stories about early AI-based cancer detection algorithms getting near 100% hit rates on identifying cancerous cells, before the developers realised that it was taking the presence of a ruler on the training images – used by doctors to detect the size of tumours – as an indicator that this was a cancerous cell.)

I find myself asking if someone else developed an algorithm to make the same judgement, would it make the same judgement?  And when companies like scite talk about their precision statistics (0.8, 0.85, and 0.97 for supporting, contradicting, and mentioning, respectively if you’re interested) to what are they comparing their success rates? Because if it’s the human judgement of the developer, I’m not sure we’re any further forward.

I also wonder whether these products are in danger of obscuring the fact that papers can be ‘influential’ in ways that are not documented by the citation network, or whether these indicators will become the sole proxy for influence – just as the Journal Impact Factor became the sole proxy for impact? And what role should developers play in highlighting this important point – especially when it’s not really in their interests to do so?


Who do the rules discriminate against?

The reason these algorithms need to be right, as I say, is that researcher careers are at stake. If you’ve only published one paper, and its citing papers are wrongly classified as disputing that paper, this could have a significant impact on your reputation. The reverse is true of course – if you’re lauded as a highly cited academic but all your citations dispute your work, surfacing this would be seen as a service to scholarship.

What I’m not clear on is how much of a risk is the former and whether the risk falls disproportionately on members of particular groups. We’ve established that the scientific system is biased against participation by some groups, and that the literature is biased against representation of some groups. So, if those groups (women, AHSS, Global South, EASL-authors) are under-represented in the training data that identifies what an ‘influential’ paper looks like, or what a ‘supporting’ citation looks like, it seems to me that there’s a pretty strong chance they are going to be further disenfranchised by these systems. This really matters.


I’m pretty confident that any such biases would not be deliberately introduced into these systems, but the fear of course, is that systems which inadvertently discriminate against certain groups might be used to legitimise their deliberate discrimination. One group that are feeling particularly nervous at the moment, with the apparent lack of value placed on their work, are the Arts and Humanities. Citation counting tools already discriminate against these disciplines due to the lack of coverage of their outputs and the relative scarcity of citations in their fields. However, we also know that citations are more likely to be used to dispute than to support a cited work in these fields. I can imagine a scenario where an ignorant third-party seeking evidence to support financial cuts to these disciplines could use the apparently high levels of disputing papers to justify their actions.

But it doesn’t stop here. In their excellent paper, Big Data’s Disparate Impact, Barocas and Selbst discuss the phenomenon of masking, where features used to define a target group (say less influential articles) also define another group with protected characteristics (e.g., sex). And of course, the scenario I envisage is a good example of this, as the Arts & Humanities are dominated by women. Discriminate against one and you discriminate against the other.

The thin end of the wedge.

All this may sound a bit melodramatic at the moment. After all these are pretty fledgling services, and what harm can they possibly do if no-one’s even heard of them?  I guess my point is that the Journal Impact Factor and the h-index were also fledgling once. And if we’d taken the time as a community to think through the possible implications of these developments at the outset, then we might not be in the position we are in now, trying to extract each mention of the JIF and the h-index from the policies, practices and psyches of every living academic.

I guess my point is that the Journal Impact Factor and the h-index were also fledgling once.

Indeed, the misuse of the JIF is particularly pertinent to these cases. Because this was a ‘technology’ designed with good intentions – to help identify journals for inclusion in the Science Citation Index – just as scite and Semantic Scholar are designed to aid discovery and citation sentiment. But it was a very small step between the development of that technology and its ultimate use for evaluation purposes. We just can’t help ourselves. And we are naïve to think that just because a tool was designed for one purpose, that it won’t be used for another.

This is why the INORMS SCOPE model, insists that evaluation approaches ‘Probe deeply’ for unintended consequences, gaming possibilities and discriminatory effects. It’s critical. And it’s so easy to gloss over when we as evaluation ‘designers’ know that our intentions are good. I’ve heard that scite are now moving on to provide supporting and disputing citation counts for journals, which we’ll no doubt see on journal marketing materials soon. How long before these citations start getting aggregated at the level of the individual?

Of course, the other thing that AI is frequently used for, once it has been trained to accurately identify a target variable, is to then go on to predictwhere that variable might occur in future. Indeed we are already starting to see this with AI-driven tools like Meta Bibliometric Intelligence and UNSILO Evaluate, where they are using the citation graph to predict which papers may go on to be highly cited and therefore a good choice for a particular journal. To me, this is hugely problematic and a further example of the Matthew Effect seeking to reward science that looks like existing science rather than ground-breaking new topics, written by previously unknowns. Do AI-based discovery and evaluation tools have the potential to go the same way, predicting based on past performance, the more influential scholars of the future?


I don’t want to be a hand-wringing nay-sayer, like an old horse-and-cart driver declaring the automobile the end of all that is holy. But I’m not alone in my handwringing. Big AI developer, DeepMind, are taking this all very seriously. A key element of their work is around Ethics & Society including a pledge to use their technologies for good. They were one of the co-founders of the Partnership on AI initiative where those involved in developing AI have an open discussion forum, including members of the public, around the potential impacts of AI and how to ensure they have positive effects. The Edinburgh Futures Institute have identified Data & AI Ethics as a key concern and are running free short courses in Data Ethics, AI & Responsible Research & Innovation. There are also initiatives such as Explainable AIwhich recognise the need for humans to understand the process and outcomes of AI developments.

I’ve no doubt that AI can do enormous good in the world, and equally in the world of information discovery and evaluation. I feel we just need to have conversations now about how we want this to pan out, fully cognisant of how it might pan out if left unsupervised. It strikes me that we might do well to develop a community agreed voluntary Code of Practice for working with AI and citation data. This would ensure that we get to extract all the benefits from these new technologies without finding them being over-relied upon for inappropriate purposes. And whilst such services are still in their infancy I think it might be a good time to have this conversation. What do you think?


I am grateful to Rachel Miles, Josh Nicholson, David Pride for conversations and input to this piece, and especially thankful to Aaron Tay who indulged in a long and helpful exchange that made this a much better offering.

Elizabeth Gadd is the Research Policy Manager (Publications) at Loughborough University. She is the chair of the Lis-Bibliometrics Forum and co-Champions the ARMA Research Evaluation Special Interest Group. She also chairs the INORMS International Research Evaluation Working Group. 

 Unless it states other wise, the content of the Bibliomagician is licensed under a Creative Commons Attribution 4.0 International License. 

Dear REF, please may we have a SEP?

This blog post by Lizzie Gadd was first published on the WonkHE Blog on 2 July 2020.

Among all the recently research-related news, we now know that UK universities will be making their submissions to the Research Excellence Framework on 31 March 2021.

And a series of proposals are in place to mitigate against the worst effects of COVID-19 on research productivity. This has led to lots of huffing and puffing from research administrators about the additional burden and another round of ‘What’s the point?’ Tweets from exasperated academics. And it has led me to reflect dreamily again about alternatives to the REF and whether there could be a better way. Something that UKRI are already starting to think about.

Going Dutch

One of the research evaluation approaches I’ve often admired is that of the Dutch Standard Evaluation Protocol (SEP). So when I saw that the Dutch had published the next iteration of their national research evaluation guidance, I was eager to take a look. Are there lessons here for the UK research community?

I think so.

The first thing to say of course, is that unlike REF, the Dutch system is not linked to funding. This makes a huge difference. And the resulting freedom from feeling like one false move could plummet your institution into financial and reputational ruin is devoutly to be wished. There have been many claims – particularly at the advent of COVID-19 – that the REF should be abandoned and some kind of FTE-based or citation-based alternative used to distribute funds. Of course the argument was quickly made that REF is not just about gold, it’s about glory, and many other things besides. Now I’m no expert on research funding, and this piece is not primarily about that. But I can’t help thinking, what if REF WAS just about gold? What if it was just a functional mechanism for distributing research funds and the other purposes of REF (of which there are five) were dealt with in another way? It seems to me that this might be to everybody’s advantage.

And the immediate way the advantage would be felt perhaps, would be through a reduction in the volume and weight of guidance. The SEP is only 46 pages long (including appendices) and, perhaps with a nod to their general levity about the whole thing, is decorated with flowers and watering cans. The REF guidance on the other hand, runs to 260 pages. (124 pages for the Guidance on Submissions plus a further 108 pages for the Panel Criteria and Working methods and 28 pages for the Code of Practice – much of which cross-refers and overlaps).

And if that’s not enough to send research administrators into raptures, the SEP was published one year prior to the start of the assessment period. Compare this to the REF where the first iteration of the Guidance on Submissions was published five years into the assessment period, and where fortnightly guidance in the form of FAQs continues to be published, and where we are still yet to receive some of it months before the deadline.

Of course, I understand why the production of REF guidance is such an industry: it’s because they are enormously consultative, and they are enormously consultative because they want to get it right, and they want to get it right because there is a cash prize. And that, I guess, is my point.

But it’s not just the length of course, it’s the content. If you want to read more about the SEP, you can check out their guidance here. It won’t take you long – did I say it’s only 46 pages? But in a nutshell: SEP runs on a six-yearly cycle and seeks to evaluate research units in light of their own aims to show they are worthy of public funding and to help them do research better. It asks them to complete a self-evaluation that reflects on past performance as well as future strategy, supported by evidence of their choosing. An independent assessment committee then performs a site visit and has a conversation with the unit about their performance and plans, and provides recommendations. That’s it.

Measure by mission

The thing I love most about the new SEP is that whilst the ‘S’ used to stand for ‘Standard’, it now stands for ‘Strategy’. So unlike REF where everyone is held to the same standard (we are all expected to care 60% about our outputs, 15% about our research environment and 25% about real-world impact), the SEP seeks to assess units in accordance with their own research priorities and goals. It recognises that universities are unique and accepts that whilst we all love to benchmark, no two HEIs are truly comparable. All good research evaluation guidance begs evaluators to start with the mission and values of the entity under assessment. The SEP makes good on this.

And of course the benefit of mission-led evaluation is that it takes all the competition out of it. There are no university-level SEP League tables, for example, because they seem to have grasped that you can’t rank apples and pears. If we really prize a diverse ecosystem of higher education institutions, why on earth are we measuring them all with the same template?

Realistic units of assessment

In fact, I’m using the term ‘institutions’ but unlike the REF, the SEP at no time seeks to assess at institutional level. They seek only to assess research at the level that it is performed: the research unit. And the SEP rules are very clear that “the research unit should be known as an entity in its own right both within and outside of the institution, with its own clearly defined aims and strategy.”

So no more shoe-horning folks from across the university into units with other folks they’ve probably never even met, and attempting to create a good narrative about their joined-up contribution, simply because you want to avoid tipping an existing unit into the next Impact Case Study threshold. (You know what I’m talking about). These are meaningful units of assessment and the outcomes can be usefully applied to, and owned by, those units.

Evaluate with the evaluated

And ownership is so important when it comes to assessment. One of the big issues with the REF is that academics feel like the evaluation is done to them, rather than with them. They feel like the rules are made up a long way from their door, and then taken and wielded sledge-hammer-like by “the University”, AKA the poor sods in some professional service whose job it is to make the submission in order to keep the research lights on for the unsurprisingly ungrateful academic cohort. It doesn’t make for an easy relationship between research administrators and research practitioners.

Imagine then if we could say to academic staff, we’re not going to evaluate you any more, you’re going to evaluate yourselves. Here’s the guidance (only 46 pages – did I say?) off you go. Imagine the ownership you’d engender. Imagine the deep wells of intrinsic motivation you’d be drawing on. Indeed, motivational theory tells us that intrinsic motivation eats extrinsic motivation for breakfast. And that humans are only ever really motivated by three things: autonomy, belonging and competence. To my mind, the SEP taps into them all:

  • Autonomy: you set your own goals, you choose your own indicators, and you self-assess. Yes, there’s some guidance, but it’s a framework and not a straight-jacket and if you want to go off-piste, go right ahead. Yes, you’ll need to answer for your choices, but they are still your choices.
  • Belonging: the research unit being assessed is the one to which you truly belong. You want it to do well because you are a part of this group. Its success and its future is your success and your future.
  • Competence: You are the expert on you and we trust that you’re competent enough to assess your own performance, to choose your own reviewers, and to act on the outcomes.

The truth will set you free

One of the great benefits of being able to discuss your progress and plans in private, face-to-face, with a group of independent experts that you have a hand in choosing, is that you can be honest. Indeed, Sweden’s Sigridur Beck from Gothenburg University confirmed this when talking about their institution-led research assessment at a recent E-ARMA webinar. She accepted that getting buy-in from academics was a challenge when there was nothing to win, but that they were far more likely to be honest about their weaknesses when there was nothing to lose. And of course, with the SEP you have to come literally face-to-face with your assessors (and they can choose to interview whoever they like) so there really is nowhere to hide.

The problem with REF is that so much is at stake it forces institutions to put their best face on, to create environment and impact narratives that may or may not reflect reality. It doesn’t engender cold, hard, critical self-assessment which is the basis for all growth. With REF you have to spin it to win it. And it’s not just institutions that feel this way. I’ve lost count of the number of times I’ve heard it said that REF UoA panels are unlikely to score too harshly as it will ultimately reflect badly on the state of their discipline. This concerns me. Papering over the cracks is surely never a good building technique?

Formative not summative

Of course the biggest win from a SEP-style process rather than a REF-style one is that you end up with a forward-looking report and not a backward-looking score. It’s often struck me as ironic that the REF prides itself on being “a process of expert review” but actually leaves institutions with nothing more than a spreadsheet full of numbers and about three lines of written commentary. Peer review in, scores out. And whilst scores might motivate improvement, they give the assessed absolutely zero guidance as to how to make that improvement. It’s summative, not formative.

The SEP feels truer to itself: expert peer review in, expert peer review out. And not only that but “The result of the assessment must be a text that outlines in clear language and in a robust manner the reflections of the committee both on positive issues and – very distinctly, yet constructively – on weaknesses” with “sharp, discerning texts and clear arguments”. Bliss.

Proof of the pudding

I could go on about the way the SEP insists on having ECRs and PhD students on the assessment committee; and about the way units have to state how they’re addressing important policy areas like academic culture and open research; and the fact that viability is one of the three main pillars of their approach. But you’ll just have to read the 46-page guidance.

The proof of the pudding, of course, is in the eating. So how is this loosey-goosey, touchy feely approach to research evaluation actually serving our laid-back low-country neighbours?

Pretty well actually.

The efficiency of research funding in the Netherlands is top drawer. And whichever way you cut the citation data, the Netherlands significantly outperforms the UK. According to SciVal, research authored by those in the Netherlands (2017-2019) achieved a Field Weighted Citation Impact of 1.76 (where 1 is world average). The UK comes in at 1.55. And as far as I can see, the only countries that can hold a candle to them are Denmark, Sweden and Switzerland – none of which have a national research assessment system.

It seems to me that we have so much to gain from adopting a SEP-style approach to research evaluation. In a post-COVID-19 world there is going to be little point looking back at this time in our research lives and expecting it to compare in any way with what’s gone before. It’s time to pay a lot less attention to judging our historical performance, and start thinking creatively about how we position ourselves for future performance.

We need to stop locking our experts up in dimly lit rooms scoring documentation. We need to get them out into our universities to meet with our people, to engage with our challenges, to breathe our research air, and to collectively help us all to be the best that we can be – whatever ’best’ may look like for us. I believe that this sort of approach would not only dramatically reduce the burden (I’m not sure if I said, but the SEP is only 46 pages long), but it would significantly increase buy-in and result in properly context-sensitive evaluations and clear road-maps for ever-stronger research-led institutions in the future.

Frankly, I don’t want to come out of REF 2027 with another bloody spreadsheet, I want us to come out energised having engaged with the best in our fields, and positioned for the next six years of world-changing research activity.

The purpose of publications in a pandemic – and beyond

This blog post by Lizzie Gadd first appeared on the WonkHE blog on 22 April 2020.

There’s nothing like a crisis to make you realise what’s important and this couldn’t be truer than in the world of scholarly communication.

As researchers have rushed to investigate our way out of the current pandemic we’ve see journal content opened uppublication speeded up and systematic reviews ramped up. And we’ve seen research evaluation mothballed.

What’s going on?

UKRI is investing in £20M in novel coronavirus research whilst REF is on hold. The Wellcome Trust are converting their offices into respite care for NHS staff not making announcements about their new responsible metrics guidance.

The virus is reminding us that the purpose of scholarly communication is not to allocate credit for career advancement, and neither is it to keep publishers afloat. Scholarly communication is about, well, scholars communicating with each other, to share insights for the benefit of humanity. And whilst we’ve heard all this before, in a time of crisis we realise afresh that this isn’t just rhetoric, this is reality.

I recently attended an excellent OASPA webinar and heard SPARC’s Heather Joseph describe how they had to negotiate permission to put COVID19 articles on CORD19 (an OA database of COVID19) content, and even then, only the COVID papers they’d got permission for were available, not the network of references that those papers cited.

I must confess that I had a little sob.

What publications are for

I’ve been working in open access for over 20 years. My first job involved seeking copyright permission to digitise journal articles for academics to use – often their own papers – in their own teaching. We advocated open access as a solution to this problem. Twenty years on, we’re still advocating it, and I’m reserving the right to feel a little bit guilty, and more than a little angry and frustrated.

It seems to me that for twenty years any efforts to advocate for open access to research have been stifled by what I call the two big “buts”:

  1. But what about publishers and scholarly societies? How do we ensure they survive and that the economy isn’t damaged? (Subtext: publications are for profit)
  2. But what about academic careers? A good publication list is critical for promotion and tenure. (Subtext: publications are for credit).

When I explained the first “but” to my partner, and how many open access policies sought to shore up the publishing industry, he said it sounded like something straight out of the eighteenth century slave trade debates. The fact that profits will be affected by doing the right thing, doesn’t mean you shouldn’t do the right thing. Right?

And I confess to becoming increasingly less sympathetic to those touting the second argument. It reminds me of that video where an Italian mayor screams at his constituents for using mobile hairdressers during the lockdown, “What are you doing?!”, he yells. “Do you want to look good in your coffin? Don’t you know the coffin will be closed?!” If we’ve created a generation of scholars who are just in it for the glory of papers in glamorous journals, and not to do good research that changes the world a little bit, then we really are in trouble.

Because the pursuit of glamour could be killing us.

The cost of publication

UKRI have been throwing money (about £150 million by my estimate) at funding eye-watering Article Processing Charges with haute couture journals for seven years. This is money that could have been used for actual life-saving research. However, by trying to balance a preference for immediate OA to the version of record with a desire not to impinge on academics’ freedom-to-publish-where-they-want (not to be confused with actual academic freedom) they found themselves paying increasingly hefty APCsfor publications in journals that were more about distinction than dissemination. And when it comes to a global emergency, we’re still having to beg publishers for access to our own research so that we might save large swathes of the human race from an unnecessary death.

This is why I see the UKRI OA policy consultation as such an important opportunity. And I’m hopeful that world events, whilst tragic and terrible, may bring into sharp relief the true value and purpose of scholarly communication. Because I fear that despite all its talk of transformation, the only thing the proposed UKRI OA policy is currently set to transform is publisher profit margins. Yes, inspired by Plan S, it seeks immediate OA to all research output – which is great. But the lack of journals with zero embargo Green OA policies may mean that publication in pricey APC-based Gold OA journals is the only option for researchers. Thus publisher profit margins continue to be maintained whilst enabling self-destructive credit-seeking publication behaviours.

I say profit and credit can no longer be policy drivers. They just can’t.

The next platform

And it strikes me that when you take income generation and evaluation out of the equation, there is an obvious solution to this problem: UKRI need to set up a funder-based publishing platform and say to recipients, if you want our money, publish your findings here. End of.

This is not a new idea, but a proven technology. Gates Open Research is a great example. Despite not being the only publication option for Gates Foundation recipients, the papers published in Gates Open Research are achieving a cites-per-paper rate on a par with the world average for medicine. Similarly, Wellcome Open Research is on a par with top quartile journals in both biochemistry and medicine in terms of it’s Scimago Journal Rank.

We can’t just keep on writing open access policies in the hope that publishers will adapt their policies to accommodate them. No. If you want something doing, do it yourself. Or at least, you develop the specification, and invite publishers to bid to provide their services in accordance with that specification.

Seriously, what’s not to like? It’s not my purpose to expand on all the features of a publishing platform, but I can’t resist the following highlights:

  1. Preprints are available for immediate review and consumption by all (we know that in many disciplines peer review doesn’t materially change the output so this is a must – and it shaves years off publication times).
  2. Post-publication peer review reports give reviewers credit.
  3. Approved outputs can be indexed in bibliographic databases and made just as discoverable as journal-based outputs.
  4. All outputs can be made available under a CC-BY licence and in accordance with all the necessary technical requirements for truly findable, accessible, interoperable and reproducible research.
  5. Links to datasets and other open research outputs can be added.
  6. UKRI gets to trace, cradle-to-grave, the impacts of their funded research as it’s all published in one place.
  7. Publishing on the platform doesn’t depend on the wealth of the organisation and the size of their Gold OA Fund.
  8. It reduces researcher anxiety about ‘getting stuff published’. If it’s funded by UKRI it is welcome on the platform.
  9. Quality is assessed through peer review reports, and impact through subsequent usage (citations if that floats your boat), and not by journal brand.

So please UKRI, when you come to make your difficult policy decisions about open access, please put front and centre at every stage a very simple question: “Will this help scholars communicate more effectively and do better research?”. Everything else is a distraction. Progress has been impeded by two buts for twenty years. It’s time to focus.

No buts.

Goodbye journal metrics, hello openness? Investigating Plan-S readiness.

This post first appeared on The Bibliomagician blog on 17 April 2020.

Plan-S based funder Open Access (OA) policies claim that they are process-agnostic, with Green and Gold OA both meeting their requirements, but what proportion of your University’s current publishing outlets are Plan-S compliant via the Green OA route and how easy might the transition to immediate open access be? Lizzie Gadd reports on an investigation at Loughborough University. 

At Loughborough University we encourage academics to follow a ‘readership, rigour and reach’ approach to choosing where to publish. And to help colleagues assess the reach of an outlet, we may suggest the use of field-normalised journal citation metrics as an indicator of its visibility. But, as an institution with an excellent track record in engaging with Open Access, and a newly minted Open Research Position Statement, we know that openness increases visibility too.  We know that highly-cited journals are only highly-cited because academics have historically submitted their best work there and we are keen to encourage colleagues to think more broadly about routes to visibility.   

Of course, we’re also aware that the external environment is changing and soon the UKRI may be adopting a Plan S-based Open Access (OA) policy which requires the researchers they fund to ensure that the work they produce is made available immediately on publication. This could be through a pure Gold OA journal, a hybrid journal that is ‘transitioning’ to pure, or via Green OA.  At Loughborough, like many medium-sized, less wealthy but research intensive institutions, we have historically embraced the Green route to OA.  Indeed, recent work by the Curtin Open Knowledge Institute using Unpaywall discovered that Loughborough University is 4th in the world in terms of the proportion of our outputs that are available as Green OA.  So, to help us not only guide our academics towards a broader interpretation of visibility, but also to assess our readiness for Plan S, we thought we’d take a look at what proportion of the outlets we currently publish in are not only ’highly cited’ in terms of journal citation metrics[1], but ‘highly open’ in terms of having a zero embargo Green OA policy. 

One thing we didn’t check as part of this analysis was whether those journals offering zero embargo Green OA policies also allowed papers to be made available under a CC-BY licence as preferred by Plan S, and as required by the proposed UKRI OA policy.  This is simply because it is so blooming difficult to get hold of this information.  The obvious place to store it is SHERPA/RoMEO and in some cases that’s where you’ll find it, but coverage is currently very patchy. 

“Currently, just over one-third of our most frequently used sources (35%) would be Plan S compliant, assuming their licences were acceptable.”

So, we downloaded from SciVal the top 100 sources[2] published in by each of our Schools (or disciplines where a School is multidisciplinary) between 2016 and 2018.  We then identified the ten sources in which our authors published most frequently. In some cases, due to differing disciplinary approaches to publishing there were fewer than ten sources in which more than one Loughborough output appeared.  In total 146 sources were identified, and these were checked for citedness (whether they appeared in the top 10% of sources by SNIP or SJR[3]) and for openness (involving a SHERPA/RoMEO search for the length of their embargo period or ‘pure’ Gold status)[4].  

In total, we found that 44% (64) of our frequently used sources were in the top 10% citation percentiles by SJR or SNIP. We also found that 30% (44) had a zero embargo green OA policy as listed on SHERPA/RoMEO and a further 5% (seven) were Gold OA journals.  This would mean that, currently, just over one-third of our most frequently used sources (35%) would be Plan S compliant, assuming their licences were acceptable.  

FIGURE 1: Outlet visibility at Loughborough University

So that was kind of interesting. But, of course, whilst we’re transitioning to new measures of journal visibility, academics will ideally want to focus on sources that are both highly cited and highly open/Plan S-compliant.  So what were their options for hitting both of these indicators? Unfortunately, not so great. Only 22 outlets – just 15% of our most-published-in sources – were both highly cited and highly open, with one additional outlet hitting the ‘highly open’ target by virtue of us paying for the privilege (APC-based Gold OA).  

When we shared this with academics their perhaps inevitable next question was, well what highly cited and highly open options do we have across the wider list of 100 sources (i.e. not just those we’re publishing in the most)?  Surely if we widened the net we’d find much greater opportunity to grasp ‘mega-visible’ publishing opportunities? Alas, it was not to be. Having extended the (very time-consuming) exercise to check all their highly cited sources for open access options we found that a much smaller proportion overall, a mere 7%, hit both indicators.  This varied of course from discipline to discipline with the greatest opportunity being afforded to communications (16%) and the lowest to education (0)[5].  And of course, this is before we factor in whether those zero embargo Green OA titles actually allow manuscripts to be posted under CC-BY licences.

Oh dear.

Now I’m aware we have a sample of one university here. And our publication practices may or may not be representative of the wider population.  Indeed, it would be great if others could run this analysis at their institutions to see how widespread this phenomenon is. But whether or not it’s replicated elsewhere, this is the reality for us.  

The low-hanging fruit of course, is to draw attention to those 54 titles (37%) that hit neither visibility indicator.  And by broadening our ‘definition’ of visibility, we can highlight a wider range of titles that can serve this important end.  However, if we were hoping to find a good list of titles in which academics were currently publishing that were both highly cited and highly open, we were pretty disappointed.  On average, each of our disciplines had five sources to choose from that were both highly open and highly cited.  In reality, some had none at all. So what do we do with that?

“Having extended the exercise to check all their highly cited sources for open access options we found that a mere 7%, hit both indicators.“

The truth is that although we use citedness and openness as visibility indicators, they do both indicate different aspects of visibility. Openness speaks of potential reach and, if other open research practices have been engaged with, perhaps increased rigour.  Citedness speaks of actual reach, of journals that have a track record of finding and influencing their target audience and, because they attract so many papers and have built up stringent peer review processes to weed out the poorer ones, they may also claim increased rigour. So, again, what to do? 

I think that all too often we research support folk can hide behind our general principles and our generic advice: “We support openness”. “Consider open access options in your publication choices.”  But if an academic collars you and asks explicitly whether they should choose Journal A that is highly cited and closed, or Journal B that is poorly cited but open, and assuming the readership and rigour of both are comparable, we find ourselves in an extremely tricky spot, caught between conscience and convention.

And that, my friends, is why to meet the demands of Plan S (and the UKRI OA policy) I fear we are going to have to abandon Green OA in favour of pricey “publish & read” or “read & publish” big deals with the publishers of existing highly cited journals. There simply aren’t the zero embargo Green OA deals around for the sources in which we publish the most. And again I iterate, this is before we’ve factored in the potential CC-BY requirement. I think it’s unlikely that publishers, given the choice between making their Green OA policy zero-embargo-with-CC-BY or receiving additional ’gold-for-Gold’, are going to opt for the former.

It might be helpful to policymakers such as the UKRI to understand how widespread this experience is, so if anyone fancies running this analysis on their own HEI, I’ve provided my method below. Similarly, if you felt able to share your data when you’re done, I’d love to hear from you.

Huge thanks to Dr Karen Rowlett for her comments on an early draft of this piece. 

[1] As I say in the opening paragraph, there is really no such thing as a highly-cited journal, only journals to which academics submit their best work, that ends up lending its citedness to that journal. However, I use the term ‘highly cited’ as a short cut. Don’t judge me.

[2] SciVal only allows you to extract 100 sources per entity.

[3] Each School and discipline at Loughborough gets to select (or not to select) their own field-normalised journal metric and threshold.

[4] If a source was not listed on SHERPA/RoMEO it was recorded as being non-compliant as we didn’t have the resource to chase down every title individually. We also did not factor in any existing publisher deals that allow Loughborough academics to publish ‘APC-free’. This in reality, the percentage of Plan S-compliant titles might be a bit higher than this.

[5] Excluding politics, history and social work where SciVal’s coverage of our titles is too low to be meaningful.


(I used SciVal but you could also use a bibliographic database such as Dimensions, Scopus or Web of Science)

  • In SciVal – Overview – 2016-18 – Published – By Scopus Source – Export the data one School/department/discipline (hereafter, research unit) at a time into Excel
  • For each research unit, highlight the ten outlets in which they publish the most
  • Highlight (using number filters) which of those ten titles appear in the top 10% by the journal citation metric of their choice. (We use data provided by SciVal to help us with this. We use 1.5 as the overall threshold for top 10% SNIP and 1.4 for top 10% SJR. You could use discipline-specific thresholds if you preferred.)
  • Search for each title on SHERPA/RoMEO and check whether it:
    • Allows self-archiving of the Accepted Version immediately on an Institutional Repository. 
    • Is a pure Gold OA Journal. On SHERPA/RoMEO this is indicated by ‘Listed in DOAJ? Yes’
  • You may also wish to note the number of titles not listed on SHERPA/RoMEO.
  • Record the total and percentage of outlets that:
    • Appear in the top 10% by journal citation metric
    • Have a zero embargo Green OA policy
    • Are pure Gold OA journals
    • Are both top 10% and zero embargo OR are both top 10% and Gold OA journals.
  • You may also do this for your whole institution if you are unable to disaggregate by field.  To do this:
    • Download a list of sources published in by your HEI between 2016-18
    • Extract their journal metrics using the Scopus source list download.
    • Filter on those in the top 10% (use thresholds provided above).
    • For those top 10% sources, check SHERPA/RoMEO for their Green/Gold OA status as above.
    • Record the total and percentage of outputs that:
      • Appear in the top 10% by journal citation metric
      • Have a zero embargo Green OA policy
      • Are pure Gold OA journals
      • Are both top 10% and zero embargo OR are both top 10% and Gold OA journals.

CRediT Check – Should we welcome tools to differentiate the contributions made to academic papers?

This post by Lizzie Gadd was originally published on the LSE Impact Blog on 20 January 2020.

Elsevier is the latest in a lengthening list of publishers to announce their adoption for 1,200 journals of the CASRAI Contributor Role Taxonomy (CRediT). Authors of papers in these journals will be required to define their contributions in relation to a predefined taxonomy of 14 roles. In this post, Elizabeth Gadd weighs the pros and cons of defining contributorship in a more prescriptive fashion and asks whether there is a risk of incentivising new kinds of competitive behaviour and forms of evaluation that doesn’t benefit researchers.

Getting named on a journal article is the ultimate prize for an aspiring academic. Not only do they get the paper on their CV (which can literally be money in the bank), but once named, all the subsequent citations accrue to each co-author equally, no matter what their contribution.

Original tweet by Ali Chamkha, retweeted with comment by Damien Debecker. 3 January 2020

However, as this tweet demonstrates, getting named on a journal article is not the same as having a) done the lion’s share of the research and/or b) actually writing the journal article. And there is a lot of frustration about false credit claims. Gift authorshipghost authorshippurchased authorship, and wrangles about author order abound. To address these problems there is some helpful guidance from organisations like the International Council of Medical Journal Editors, the British Sociological Association and the Committee on Publication Ethics (COPE) about what constitutes an author. Perhaps most significantly, in 2014 we saw the launch of CASRAI’s Contributor Role Taxonomy, CRediT.

CRediT aims to ensure that everyone attributed on a paper gets recognised for their contribution. As such it goes one step further than guidance and provides a structured way for authors to declare their various contributions. It lists 14 contributor roles, some of which you might expect (writing, analysis) and some of which you might not (supplying study resources and project admin). And whilst it won’t stop someone being named who should not be named, nor will it ensure that everyone is named who should be named, it does make omissions a bit more difficult – and for this it has been highly praised.

But, I still have some questions about CRediT. And whilst I might be overthinking this (bad habit), I’d welcome any thoughts the community might have on the following:

  1. Are there important differences between authors and contributors that we need to retain and how does CRediT support these?
  2. Is a focus on credit-seeking what the community needs, or will this end up embedding the status quo around problematic output-based evaluation?
  3. Are we going to end up with new forms of CRediT-based evaluation that might have negative systemic effects?

Authors vs contributors

So, the ‘C’ in CRediT stands for Contributor. It is a Contributor Role Taxonomy. But what is not too clear is whether CRediT seeks to capture contributions to the paper, or contributions to the research. It might sound like I’m being picky, but in legal terms there is a big difference between these two. Because, someone who writes the paper is technically an author and has rights as such and someone who only contributes to the underlying research is not. So, whilst an author is always a contributor, a contributor is not always an author.

Why is this important? Well, because corresponding authors are usually responsible for the legal transfer or assignment of rights to the publisher prior to publication. And if that corresponding author is actually just a contributor (and CRediT starts making this explicit when it wasn’t previously), then technically they can‘t transfer those rights to the publisher because they don’t own any. This is particularly important because corresponding authors are often senior researchers or principal investigators who are less likely to be paper writers.

But it’s not just for legal reasons that these labels matter. As the tweet shows, they matter to researchers too. Researchers have a sense that the term ‘author’ means something different, more significant, than ’contributor’. Disciplinary norms play a huge role here of course. In the medical sciences, the ICMJE have actually spelled out which roles constitute ‘authorship’ and which constitute ‘non-author contributorship’. They even specify that non-author contributors should only be ‘acknowledged’ and not listed in the author by-line.

In the Arts & Humanities, naming a single author a ‘contributor’ would seem entirely inappropriate as it suggests that others had a hand in their work. However, I wonder whether single-authors, if called upon to strictly adhere to CRediT, would find themselves obliged to list others as ‘contributors’ (Librarians maybe?) where historically in their disciplines they might not do so.

Assuming that CRediT are not seeking to abolish the role of author altogether and assuming they don’t believe non-author-contributors should be relegated to the acknowledgements, where presumably they’d get no formal credit at all, I’m not entirely sure where this leaves us. Are they creating a third category of research participant, slightly more than ‘acknowledgee’, but less than author?  And assuming such a status could easily be incorporated into the world’s bibliographies, can someone’s contribution be assessed merely on the role name (e.g., ‘Software’) or would it need to be assessed on the level of their contribution in that role?

My final question is whether all disciplines are happy that the 14 roles identified by CRediT are the right ones? Now I’m aware that CRediT was not initially designed with Arts & Humanities subjects in mind, so this might not be an entirely fair question. But I must say it surprised me recently, when a materials scientist, Dr Ben Britton, spoke of his frustration at having to adhere to CRediT because he felt the 14 roles weren’t pertinent to his discipline. I’m left wondering whether it was too ambitious to believe the many and varied contributions to the many and varied scholarly sub-disciplines could be distilled into 14 categories. And whether it is unrealistic to think that the same 14 categories are going to remain the same forever.


My second niggle with CRediT is its name. I am partial to a cheeky acronym myself, and I can only imagine the glee its creator must have felt when they came up with this one. After all, who doesn’t want credit for their contribution?!  But scholarship is not all about getting credit, believe it or not. There is something about taking responsibility too. And, as we’ve seen above, about taking copyright ownership of their work.

There are a lot of problems in the scholarly communications space caused by credit-seeking behaviours. For instance, publishing only headline-grabbing results, not publishing null results, publishing too hastily with subsequent retractions, and irreproducible science. We know that if more researchers had a stronger sense of the copyright ownership that authorship conferred, and felt less driven to relinquish their rights to publishers in exchange for reputational credit offered by publication, we wouldn’t find ourselves in a situation where the majority of our scholarly output is owned by commercial entities.

Indeed, one of the problems CRediT itself is seeking to address is unfair credit-seeking.  So ironically, I wonder whether CRediT is unwittingly contributing to the problem it seeks to solve.

Evaluation by CRediT

Of course, our interminable obsession with publication-based credit is inevitably going to lead some to make use of CRediT for ostensibly fairer research evaluations. We know that getting a citation to a paper on which you were the 1,000th author cannot mean the same thing as getting a citation to a paper on which you were a single-author. Clarivate have recently argued that with the increase in hyper-authored papers, the fractionalisation of citations should become the norm. Makes sense. How natural then, to start weighting citations based on the actual role you played on a paper?

We are already seeing bibliometric analyses based on contributor roles. Whilst this is interesting at a ‘science of science’ level (e.g., are roles gender based?), it worries me on an individual researcher evaluation level.  Are we going to see some roles prized above others? Will some roles literally ‘count’ and some roles not?  And what impact will this have on those early career researchers in project administration and literature searching roles that CRediT seeks to give previously unacknowledged credit to?  Will they, in another terrible fit of irony, be excluded from some forms of credit altogether?

I’m not sure there is any way of mitigating against the worst effects of this. And I’m particularly concerned because of course the underlying CRediT data required to run such analyses will be collected and owned by publishers. I note from the CASRAI website that they are seeking to “ensure that CRediT is tied to ORCID and included in the Crossref metadata capture”.  But not all metadata ingested by Crossref is available openly. And the world’s largest journal publisher, which recently announced the adoption of CRediT by 1200 of their journals, infamously does not cooperate with open citation services.

To me this is a concern. I’m not sure if CASRAI has any power to ensure that CRediT-adopting journal publishers commit to making their resulting CRediT data available openly via Crossref, but if so I would urge them to do this.  At least this way publishers won’t end up with exclusive control over the community’s CRediT data.


I don’t want to rain on CRediT’s parade because the problem they seek to address is a real one. And the efforts they’ve made have had a considerable impact. However, I fear there are some challenges with CRediT’s current trajectory which may mean that those they hope to provide greater visibility for actually receive less credit rather than more. There are no easy answers here of course, but I worry that without an open conversation about some of these issues, CRediT might lose some of its very considerable potential. What do you think?


Huge thanks to Professor Charles Oppenheim and Dr Simon Kerridge for reviewing an early draft of this piece. In CRediT terms: “Writing – review & editing”. Definitely contributors not authors I’m thinking…