Elsevier have endorsed the Leiden Manifesto: so what?

This piece was originally posted to The Bibliomagician blog on 22 September 2020

Lizzie Gadd speculates as to why Elsevier endorsed the Leiden Manifesto rather than signing DORA, and what the implications might be.

If an organisation wants to make a public commitment to responsible research evaluation they have three main options: i) sign DORA, ii) endorse the Leiden Manifesto (LM), or iii) go bespoke – usually with a statement based on DORA, the LM, or the Metric Tide principles.

The LIS-Bibliometrics annual responsible metrics survey shows that research-performing organisations adopt a wide range of responses to this including sometimes signing DORA and adopting the LM. But when it comes to publishers and metric vendors, they tend to go for DORA. Signing DORA is a proactive, public statement and there is an open, independent record of your commitment. DORA also has an active Chair in Professor Stephen Curry, and a small staff in the form of a program director and community manager, all of whom will publicly endorse your signing which leads to good PR for the organisation.

A public endorsement of the LM leads to no such fanfare. Indeed, the LM feels rather abandoned by comparison. Despite a website and blog, there has been little active promotion of the Manifesto, nor any public recognition for anyone seeking to endorse it. Indeed one can’t help wondering how differently the LM would operate if it had been born in a UK institution subject to the impact-driven strictures of the REF?

But despite this, Elsevier recently announced that they had chosen the Leiden Manifesto over DORA. Which leads us to ask i) why? And ii) what will this mean for their publishing and analytics business?

Why not DORA?

Obviously I wasn’t party to the conversations that led to this decision and can only speculate. But for what it’s worth, my speculation goes a bit like this:

So, unlike the LM which provides ten principles to which all adopters should adhere, DORA makes different demands of different stakeholders. So research institutions get off pretty lightly with just two requirements: i) don’t use journals as proxies for the quality of papers, and ii) be transparent about your reward criteria. Publishers and metrics suppliers, however, are subject to longer lists (see box) and of course, Elsevier are both.  And it is within these lists of requirements that I think we find our answers.

This image has an empty alt attribute; its file name is image.png
Box: Excerpt from DORA Principles
  1. Positioning CiteScore as the JIF’s responsible twin.

Firstly, DORA demands that publishers ‘greatly reduce emphasis on JIF as a promotional tool’. However, Elsevier have invested heavily in CiteScore (their alternative to the JIF) and are not likely to want to reduce emphasis on it. Indeed the press release announcing their endorsement of the LM provided as an example, the way they’d recently tweaked the calculation of CiteScore to ensure it met some of the LM principles, positioning it as a ‘responsible metric’ if you will. This is something they’d struggle to get away with under DORA.

  1. Open citations? Over my dead body

One of the less well-discussed requirements of DORA for publishers is to “remove all reuse limitations on reference lists in research articles and make them available under the Creative Commons Public Domain Dedication.” In other words, DORA expects publishers to engage with open citations. This is something Elsevier have infamously failed to do.

  1. Open data? You’ll have to catch me first

And finally, DORA expects metric suppliers to not only “be open and transparent by providing data and methods used to calculate all metrics” (which they partly do for subscribers) but to “Provide the data under a licence that allows unrestricted reuse, and provide computational access to data, where possible” (which they don’t).

So whereas DORA is a relatively easy sign for HEIs (only two requirements) for publishers, it’s more tricky than might first appear (five requirements) and for an organisation like Elsevier which also supplies metrics, they have to contend with a further four requirements, which would essentially eat away at their profits. And we all know that they’re only just scraping by, bless them.

The impact of endorsing the Leiden Manifesto

But isn’t it good enough that they’ve endorsed the Leiden Manifesto? After all, it’s a comprehensive set of ten principles for the responsible use of bibliometrics? Well, being a seasoned grumbler about some of the less savoury aspects of Elsevier’s SciVal, I decided to take to the discussion lists to see whether they saw this move as a the beginning or the end of their responsible metrics journey. Was this the start of a huge housekeeping exercise which would sweep away the h-index from researcher profiles? Disinfect the unstable Field-Weighted Citation Index from author rankings? And provide health-warnings around some of the other over-promising and under-delivering indicators?

Apparently not.

There is nothing inherently wrong with the h-index” said Holly Falk-Krzesinski, Elsevier’s Vice-President for Research Intelligence, pointing to three of the Leiden Manifesto’s principles where she felt it passed muster. (Despite on the same day, Elsevier’s Andrew Plume questioning its validity).  And as part of a basket of metrics, she considers the FWCI is a perfectly usable indicator for researchers. (Something Elsevier’s own SciVal Advisors disagree with). And she believes the h-index is “not displayed in any special or prominent way” on Pure Researcher Profiles. Erm…

This image has an empty alt attribute; its file name is leiden-elsevier-h-index-profile-example.jpg

And after several rounds of this, frankly, I gave up. And spent a weekend comfort-eating Kettle chips. Because I care deeply about this. And, honestly, it felt like to Elsevier it was just another game to be played. 

Responsible is as responsible does

Back in 2018 I made the point that if we weren’t careful, responsible metrics statements could, in an ironic turn, easily become ‘bad metrics’, falsely signifying a responsible approach to metrics that wasn’t there in practice. And the reason these statements are so vulnerable to this is that neither DORA nor the LM are formally policed. Anyone can claim to be a follower and the worst that can happen is that someone calls out your hypocrisy on Twitter. Which does happen. And is sometimes even effective.

It is for this reason that the Wellcome Trust have stated that adopting a set of responsible metrics principles is not enough. If you want to receive their research funding from 2021, you need to demonstrate that you are acting on your principles. Which is fair. After all, if you want Athen Swan accreditation, or Race Equality Chartership or a Stonewall Charter, you have to provide evidence and apply for it. It’s not self-service. You can’t just pronounce yourself a winner. And I can’t help wondering: yes, Elsevier has endorsed the Leiden Manifesto, but would the Leiden Manifesto (given the chance) endorse Elsevier? 

Now I know that CWTS and DORA would run a mile from such a proposition, but that doesn’t mean it’s not needed.  Responsible-metrics-washing is rife. And whilst I‘d rather folks washed with responsible metrics than anything else – and I’m sure a few good things will come out of it – it does rather feel like yet another instance of a commercial organisation paying lip-service to a community agenda for their own ends (see also: open access and copyright retention).

Right on cue, Helen Lewis in The Atlantic recently described the ”self-preservation instinct [that] operates when private companies struggle to acclimatize to life in a world where many consumers vocally support social-justice causes”. “Progressive values are now a powerful branding tool” she writes, and “Brands will gravitate toward low-cost, high-noise signals as a substitute for genuine reform, to ensure their survival.” Correct me if I’m wrong but that sounds pretty apposite?

Of course, it’s early days for Elsevier’s Leiden Manifesto journey and Andrew Plume did seek to reassure me in a video call that they were still working through all the implications. So let’s hope I’m worrying about nothing and we’ll be waving goodbye to the h-index in Elsevier products any day soon.  But if nothing does transpire, I know as the developer of a responsible metrics model myself, that I’d feel pretty sick about it being used as empty virtue-signalling. And it does occur to me that funders’ seeking to hold institutions to account for their responsible research evaluation practices might do well to direct their attention to the publishers they fund.

Otherwise I fear it really will be case of, well, Elsevier have endorsed the Leiden Manifesto: so what?

Rethinking the Rankings

This piece was originally posted to the ARMA blog on 14 October 2020.

Lizzie Gadd and Richard Holmes share the initial findings of the INORMS Research Evaluation Working Group’s efforts to rate the World University Rankings.

When the INORMS Research Evaluation Working Group (REWG) was formed in 2016, Lizzie asked the representatives of twelve international research management societies where they felt we should focus our attention if we wanted to achieve our aim of making research evaluation more meaningful, responsible and effective. They were unanimous: the world university rankings. Although research managers are not always the ones in their institutions that deal with the world university rankers, they are one of the groups that feel their effect most keenly: exclusion from certain funding sources based on ranking position; requests to reverse engineer various indicators to understand their scores, and calls to introduce policies that may lead to better ranking outcomes. And all whilst fully appreciating how problematic rankings are in terms of their methodology, their validity and their significance.

So what could be done? Well, it was clear that one of the key issues with the world ranking bodies is that they are unappointed and they answer to nobody. In an earlier blog post where Lizzie describes the research evaluation environment as a food chain, she put them at the top: predators on which no-one predates. (Although some Scandinavian colleagues see them more as parasites that feed off the healthy organisms: taking but not giving back). And of course the way to topple an apex predator, is to introduce a new one: to make them answerable to the communities they rank.  So this is what the INORMS REWG set about doing, by seeking to introduce an evaluation mechanism of their own to rate the rankers.

In some parallel work, the REWG were developing SCOPE, a five-step process for evaluating effectively, so we were keen to follow our own guidance when designing our ranker ratings. And this is how we did so:

Start with what you value

Our first step was to identify what it was we wanted from any mechanism seeking to draw comparisons between universities. What did we value? To this end we sought out wisdom from all those who’ve gone ahead of us in this space: the Berlin Principles on Ranking HEIs, the work of Ellen Hazelkorn, the CWTS principles for responsible use of rankings, the Leiden ManifestoDORAYves Gingras, and many others. From their thoughts we synthesised a draft list of Criteria for Fair and Responsible University Rankings and put them out to the community for comment. We got feedback from a wide range of organisations: universities, academics, publishers and ranking organisations themselves. The feedback was then synthesised into our value document – what we valued about the entity (rankers) under evaluation. These fell into four categories: good governance, transparency, measure what matters, and rigour.

Context considerations

There are lots of reasons we evaluate things. What we’re trying to achieve here is a comparison of the various ranking organisations, with the ultimate purpose of incentivising them to do better. We want to expose where they differ from each other but also to highlight areas that the community cares about where they currently fall short.  What we didn’t want to do is create another ranking. It would have been very tempting to do so: “ranking the rankings” has a certain ring to it.  But not only would this mean that a ranking organisation got to shout about its league-table-topping status – something we didn’t want to endorse – but we wouldn’t be practising what we preached: a firm belief that it is not possible to place multi-faceted entities on a single scale labelled ‘Top’ and ‘Bottom’.

Options for evaluating

Once we had our list of values, we then set about translating them into measurable criteria – into indicators that were a good proxy for the quality being measured. As anyone who’s ever developed an evaluation approach will know, this is hard. But again, we sought to adhere to our own best practice by providing a matrix by which evaluators could provide both quantitative and qualitative feedback. Quantitative feedback took the form of a simple three-point scale according to whether the ranker fully (2 marks), partially (1 mark) or failed (0 marks) to meet the set criteria. Qualitative feedback took the form of free-text comments.  To ensure transparency and mitigate against bias as best we could, we asked a variety of international experts to each assess one of six ranking organisations against the criteria. INORMS REWG members also undertook evaluations, and, in line with the SCOPE principle of ‘evaluating with the evaluated,’ each ranker was also invited to self-assess themselves.  Only one ranking organisation, CWTS Leiden, accepted our offer to self-assess and they provided free-text comments rather than scores.  All this feedback was then forwarded to our senior expert reviewer, Dr Richard Holmes, author of the University Ranking Watch blog, and certainly one of the most knowledgeable University Rankings experts in the world. He was able to combine the feedback from our international experts with his own, often inside, knowledge of the rankings, to enable a really robust, expert assessment.

Probe deeply

Of course all good evaluations should probe their approach, which is something we sought to do during the design stage, but something we also came back to post-evaluation. We observed some criteria where rankings might be disadvantaged for good practice – for example, where a ranking did not use surveys and so could not score. This led us to introducing ‘Not Applicable’ categories to ensure they would not be penalised. One or two questions were also multi-part which made it difficult to assess fairly across the rankers. In any future iteration of the approach we would seek to correct this. We noted that the ‘partially meets’ category is also very broad, ranging from a touch short of perfect to a smidge better than fail. In future, a more granular five- or even ten-point grading system might provide a clearer picture as to where a ranking succeeds and where it needs to improve.  In short, there were some learning points. But that’s normal. And we think the results provide a really important proof-of-concept for evaluating the world rankings.

Evaluate

So what did we find? Well we applied our approach to six of the largest and most influential world university rankings: ARWUTHE WRQS, U-MultirankCWTS Leiden and US News & World Report. A full report will be forthcoming and the data showing the expert assessments and senior expert calibrations are available. A spidergram of the quantitative element is given in Figure 1 and some headline findings are provided below.

Machine generated alternative text:
Good gcwernance 
1008; 
Rigour 
Measure what matters 
— ARM,' U 
Leiden 
T ranspare no,' 
Multirmk 
LENE'.VS

Figure 1. Spidergram illustrating the actual scores/total possible score for each world ranker. The full data along with the important qualitative data is available.

Good governance

The five key expectations of rankers here were that they engaged with the ranked, were self-improving, declared conflicts of interest, were open to correction and dealt with gaming. In the main all ranking organisations made some efforts towards good governance, with clear weaknesses in terms of declaring conflicts of interest: no ranker really did so, even though selling access to their data and consultancy services was commonplace. 

Transparency

The five expectations of rankers here were that they had transparent aims, methods, data sources, open data and financial transparency.  Once again there were some strengths when it came to the transparency of the rankers’ aims and methods – even if arguably the methods didn’t always meet the aims. The weaknesses here were around the ability of a third-party to replicate the results (only ARWU achieved full marks here), data availability, and financial transparency (where only U-Multirank achieved full marks).

Measure what matters

The five expectations of rankers here were that they drove good behaviour, measured against mission, measured one thing at a time (no composite indicators), tailored results to different audiences and gave no unfair advantage to universities with particular characteristics. Not surprisingly, this is where most rankings fell down. CWTS Leiden and U-Multirank scored top marks in terms of efforts to drive appropriate use of rankings and measuring only one thing at a time, the others barely scored.  Similarly, Leiden & U-Multirank fared quite well on measuring against mission, unlike the others. But no ranking truly tailored their offer to different audiences, assuming that all users – students, funders, universities, would value the different characteristics of universities in the same way.  And neither could any whole-heartedly say that they offered no unfair advantage to certain groups.

Rigour

The one thing university rankings are most criticised for is their methodological invalidity, and so it may come as no surprise that this was another weak section for most world rankers. Here we were looking for rigorous methods, no ‘sloppy’ surveys, validity, sensitivity and honesty about uncertainty. The ranker that did the best here by a country mile was CWTS Leiden, with perfect scores for avoiding the use of opinion surveys (joined by ARWU), good indicator validity (joined by U-Multirank), indicator sensitivity, and the use of error bars to indicate uncertainty. All other rankers scored their lowest in this section.

Summary

So there is clearly work to be done here, and we hope that our rating clearly highlights what needs to be done and by whom. And in case any ranking organisation seeks to celebrate their relative ‘success’ here, it’s worth pointing out that a score of 100% on each indicator is what the community would deem to be acceptable. Anything less leaves something to be desired.

One of the criticisms we anticipate is that our expectations are too high. How can we expect rankings to offer no unfair advantage? And how can we expect commercial organisations to draw attention to their conflicts of interest? Our answer would be that just because something is difficult to achieve, doesn’t mean we shouldn’t aspire to it. Some of the sustainable development goals (no poverty, zero hunger) are highly ambitious, but also highly desirable. The beauty of taking a value-led approach, such as that promoted by SCOPE, is that we are driven by what we truly care about, rather than by the art of the possible, or the size of our dataset. If it’s not possible to rank fairly, in accordance with principles developed by the communities being ranked, we would argue that it is the rankings that need to change, not the principles. 

We hope this work initiates some reflection on the part of world university ranking organisations. But we also hope it leads to some reflection by those organisations that set so much store by the world rankings: the universities that seek uncritically to climb them; the students and academics that blindly rely on them to decide where to study or work; and the funding organisations that use them as short-cuts to identify quality applicants. This work provides qualitative and quantitative evidence that the world rankings cannot, currently, be relied on for these things. There is no fair, responsible and meaningful university ranking. Not really. Not yet. There are just pockets of good practice that we can perhaps build on if there is the will.  Let’s hope there is.

Gadding about…*

*Virtually of course.

Courtesy of the pestilence currently scourging our planet, I’ve been able to accept four opportunities to speak this Autumn, as I will be doing so from the comfort of my own home office. For anyone interested in tuning in, I’ve provided the details here and will update this with more intel as I have it.


22-Sep-20 08.30 BST: Finnish Ministry of Education & Culture

https://www.helsinki.fi/en/news/helsinki-university-library/research-evaluation-national-bibliometrics-seminar-2020

Bibliometrics: Diversity’s friend or foe? Assessing research performance using bibliometrics alone does not help create a diverse research ecosystem. But can bibliometrics ever be used to support diversity? And if not, how else can we evaluate what we value about research?


07-Oct-20 17.00 BST: NIH Bibliometrics & Research Evaluation Symposium

https://www.nihlibrary.nih.gov/services/bibliometrics/bibSymp20

The Five Habits of Highly-Effective Bibliometric Practitioners Drawing on ten years’ experience supporting bibliometric and research evaluation practitioner communities, this presentation will highlight five habits of highly effective practitioners providing practical hints and tips for those seeking to support their own communities with robust research evaluation.


15-Oct-20 08.15 BST: 25th Nordic Workshop on Bibliometrics and Research Policy

Register

The Research Evaluation Food Chain and how to fix it. Poor research evaluation practices are the root of many problems in the research ecosystem and there is a need to introduce change across the whole of the ‘food chain’. This talk will consider the challenge of lobbying for change to research evaluation activities that are outside your jurisdiction – such as senior managers and rankings (introducing the work of INORMS REWG), vendors and ‘freemium’ citation-based services.


20-Oct-20 15.00 BST: Virginia Tech Open Access Week

https://virginiatech.zoom.us/webinar/register/WN_DbZMp_YcRKux9X3E9Hevig

Counting What Counts In Recruitment, Promotion & Tenure. What we reward through recruitment, promotion and tenure processes is not always what we actually value about research activity. This talk will explore how we can pursue value-led evaluations – and how we can persuade senior leaders of their benefits.


AI-based citation evaluation tools: good, bad or ugly?

AI-based citation evaluation tools: good, bad or ugly?

This piece was originally posted on The Bibliomagician on 23 July 2020.

Lizzie Gadd gets all fancy talking about algorithms, machine learning and artificial intelligence. And how tools using these technologies to make evaluative judgements about publications are making her nervous.

A couple of weeks ago, The Bibliomagician posted an interesting piece by Josh Nicholson introducing scite. scite is a new Artificial Intelligence (AI) enabled tool that seeks to go beyond citation counting to citation assessment, recognising that it’s not necessarily the number of citations that is meaningful, but whether they support or dispute the paper they cite.

scite is one of a range of new citation-based discovery and evaluation tools on the market. Some, like Citation Gecko, Connected Papers and CoCites, use the citation network in creative ways to help identify papers that might not appear in your results list through simple keyword matching. They use techniques like co-citation (where two papers appear together in the same reference list) or bibliographic coupling (where two papers cite the same paper) as indicators of similarity. This enables them to provide “if you like this you might also like that” type services.

Other tools, like scite and Semantic Scholar, go one step further and employ technologies like Natural Language Processing (NLP), Machine Learning (ML) and Artificial Intelligence (AI) to start making judgements about the papers they index. In Semantic Scholar’s case it seeks to identfy where a paper is ‘influential’ and in scite’s case, where citations are ‘supporting’ or ‘disputing’.

And this is where I start to twitch.

THE GOOD

I mean, there is an obvious need to understand the nuance of the citation network more fully. The main criticism of citation-based evaluation has always been that citations are wrongly treated as always a good thing. In fact, the Citation Typing Ontology lists 43 different types of citation (including my favourite, ‘is-ridiculed-by’). Although the fact that the majority are positive (<0.6% of citations are negative by scite’s calculations) itself may indicate a skewing of the scholarly record. Why cite work you don’t rate, knowing it will lead to additional glory for that paper? So if we can use new technologies to provide more insight into the nature of citation, this is a positive thing. If it’s reliable. And this is where I have questions. And although I’ve dug into this a bit, I freely admit that some of my questions might be borne of ignorance. So feel free to use the comments box liberally to supplement my thinking.

A bit about the technologies

All search engines use algorithms (sets of human encoded instructions) to return the results that match our search terms. Some, like Google Scholar, will use the citedness of papers as one element of its algorithm to sort the results in an order that may give you a better chance of finding the paper you’re looking for.  And we already know that this is problematic in that it compounds the Matthew Effect: the more cited a paper is, the more likely it will surface in your search results, thereby increasing its chances of getting read and further cited. And of course, the use of more complex citationnetwork analysis for information discovery can contribute to the same problem: by definition the less cited works are going to be less well-connected and thus returned less often by the algorithm.

Even their developers might not ever really understand what characteristics the AI is identifying in the data as ultimately contributing to the desired outcome.

Image by Gordon Johnson from Pixabay

But it’s the use of natural language processing (NLP) to ‘read’ the full text of papers and artificial intelligence or machine learning to find patterns in the data that concerns me more. So whereas historically humans might provide a long list of instructions to tell computers how to identify an influential paper, ML works by providing a shed load of examples of what an influential paper might look like, and leaving the AI to learn for itself. When the AI gets it right, it gets rewarded (reinforcement learning) and so it goes on to achieve greater levels of accuracy and sophistication. So much so, that even their developers might not ever really understand what characteristics the AI is identifying in the data as ultimately contributing to the desired outcome.

Can you see why am I twitching?

THE (POTENTIALLY) GOOD

Shaky foundations

The obvious problem is that the assumptions we draw from these data are inherently limited by the quality of the data themselves. So we know that the literature is already hugely biased towards positive studies over null and negative results and towards journal-based STEM over monograph-based AHSS. So the literature is, in this way, already a biased sample of the scholarship it seeks to represent.

We also know that within the scholarship the literature does represent, all scholars are not represented equally. We know that women are less well cited than men, that they self-cite less and are less well-connected. We know the scholarship of the Global South is under-represented, as is scholarship in languages other than English. And whilst a tool may be able to accurately identify positive and negative citations, it can’t (of course) assess whether those positive and negative citations were justified in the first place.

But of course these tools aren’t just indexing the metadata but the full text. So the question I have here is whether Natural Language Processing works equally well on language that isn’t ’natural’ – i.e., where it’s the second language of the author? And what about cultural differences in the language of scholarship, where religious or cultural beliefs make expressions of confidence in the results less certain, less self-aggrandising.  And I’ll bet you a pound that there are disciplinary differences in the way that papers are described when being cited.

So we know that scholarship isn’t fully represented by the literature. The literature isn’t fully representative of the scholars. The scholars don’t all write in the same way. And of course, some of these tools are only based on a subset of the literature anyway.

At best, this seems unreliable, at worst, discriminatory?

Who makes the rules?

Of course, you may well argue that this is a problem we already face with bibliometrics, as recently asserted by Robyn Price.  I guess my particular challenge with some of these tools is that they go beyond simply making data and their inter-relationships available for human interpretation, to actually making explicit value judgements about those data themselves. And that’s where I think things start getting sticky because someone has to decide what that value (known as the target variable) looks like. And it’s not always clear who is doing it, and how.

If you think about it, being the one who gets to declare what an influential paper looks like, or what a disruptive citation looks like, is quite a powerful position. Oh not right now maybe, when these services are in start-up and some products are in Beta. But eventually, if they get to be used for evaluative purposes, you might end up with the power over someone’s career trajectory. And what qualifies them to make these decisions? Who appointed them? Who do they answer to? Are they representative of the communities they evaluate? And what leverage do the community have over their decisions?

If you think about it, being the one who gets to declare what an influential paper looks like, or what a disruptive citation looks like, is quite a powerful position.

When I queried scite’s CEO,  Josh Nicholson, about all this, he confirmed that a) folks were already challenging their definitions of supportive and disruptive citations; b) these challenges were currently being arbitrated by just two individuals; and c) they currently had no independent body (e.g. an ethics committee) overseeing their decision-making – although they were open to this. 

And this is where I find myself unexpectedly getting anxious about the birth of free/mium type services based on open citations/text that we’ve all been calling for. Because at least if a commercial product is bad, no-one need buy it, and if you do, as a paying customer you have some* leverage. But I’m not sure if the community will have the same leverage over open products, because, well, they’re free aren’t they? You take them or leave them. And because they’re free, someone, somewhere, will take them.  (Think Google Scholar).

*Admittedly not a lot in my experience.

Are the rules right?

Of course, it’s not just who defines our target variable but how they do it, that matters. What exactly are these algorithms being trained to look for when they seek out ’influential’, ‘supportive’ or ’disruptive’ citations? And does the end user know that? More pertinently, does the developer know that? Because by definition, AI is trained by examples of what is being sought, rather than by human-written rules around how to find it. (There are some alarming stories about early AI-based cancer detection algorithms getting near 100% hit rates on identifying cancerous cells, before the developers realised that it was taking the presence of a ruler on the training images – used by doctors to detect the size of tumours – as an indicator that this was a cancerous cell.)

I find myself asking if someone else developed an algorithm to make the same judgement, would it make the same judgement?  And when companies like scite talk about their precision statistics (0.8, 0.85, and 0.97 for supporting, contradicting, and mentioning, respectively if you’re interested) to what are they comparing their success rates? Because if it’s the human judgement of the developer, I’m not sure we’re any further forward.

I also wonder whether these products are in danger of obscuring the fact that papers can be ‘influential’ in ways that are not documented by the citation network, or whether these indicators will become the sole proxy for influence – just as the Journal Impact Factor became the sole proxy for impact? And what role should developers play in highlighting this important point – especially when it’s not really in their interests to do so?

THE UGLY

Who do the rules discriminate against?

The reason these algorithms need to be right, as I say, is that researcher careers are at stake. If you’ve only published one paper, and its citing papers are wrongly classified as disputing that paper, this could have a significant impact on your reputation. The reverse is true of course – if you’re lauded as a highly cited academic but all your citations dispute your work, surfacing this would be seen as a service to scholarship.

What I’m not clear on is how much of a risk is the former and whether the risk falls disproportionately on members of particular groups. We’ve established that the scientific system is biased against participation by some groups, and that the literature is biased against representation of some groups. So, if those groups (women, AHSS, Global South, EASL-authors) are under-represented in the training data that identifies what an ‘influential’ paper looks like, or what a ‘supporting’ citation looks like, it seems to me that there’s a pretty strong chance they are going to be further disenfranchised by these systems. This really matters.

Masking

I’m pretty confident that any such biases would not be deliberately introduced into these systems, but the fear of course, is that systems which inadvertently discriminate against certain groups might be used to legitimise their deliberate discrimination. One group that are feeling particularly nervous at the moment, with the apparent lack of value placed on their work, are the Arts and Humanities. Citation counting tools already discriminate against these disciplines due to the lack of coverage of their outputs and the relative scarcity of citations in their fields. However, we also know that citations are more likely to be used to dispute than to support a cited work in these fields. I can imagine a scenario where an ignorant third-party seeking evidence to support financial cuts to these disciplines could use the apparently high levels of disputing papers to justify their actions.

But it doesn’t stop here. In their excellent paper, Big Data’s Disparate Impact, Barocas and Selbst discuss the phenomenon of masking, where features used to define a target group (say less influential articles) also define another group with protected characteristics (e.g., sex). And of course, the scenario I envisage is a good example of this, as the Arts & Humanities are dominated by women. Discriminate against one and you discriminate against the other.

The thin end of the wedge.

All this may sound a bit melodramatic at the moment. After all these are pretty fledgling services, and what harm can they possibly do if no-one’s even heard of them?  I guess my point is that the Journal Impact Factor and the h-index were also fledgling once. And if we’d taken the time as a community to think through the possible implications of these developments at the outset, then we might not be in the position we are in now, trying to extract each mention of the JIF and the h-index from the policies, practices and psyches of every living academic.

I guess my point is that the Journal Impact Factor and the h-index were also fledgling once.

Indeed, the misuse of the JIF is particularly pertinent to these cases. Because this was a ‘technology’ designed with good intentions – to help identify journals for inclusion in the Science Citation Index – just as scite and Semantic Scholar are designed to aid discovery and citation sentiment. But it was a very small step between the development of that technology and its ultimate use for evaluation purposes. We just can’t help ourselves. And we are naïve to think that just because a tool was designed for one purpose, that it won’t be used for another.

This is why the INORMS SCOPE model, insists that evaluation approaches ‘Probe deeply’ for unintended consequences, gaming possibilities and discriminatory effects. It’s critical. And it’s so easy to gloss over when we as evaluation ‘designers’ know that our intentions are good. I’ve heard that scite are now moving on to provide supporting and disputing citation counts for journals, which we’ll no doubt see on journal marketing materials soon. How long before these citations start getting aggregated at the level of the individual?

Of course, the other thing that AI is frequently used for, once it has been trained to accurately identify a target variable, is to then go on to predictwhere that variable might occur in future. Indeed we are already starting to see this with AI-driven tools like Meta Bibliometric Intelligence and UNSILO Evaluate, where they are using the citation graph to predict which papers may go on to be highly cited and therefore a good choice for a particular journal. To me, this is hugely problematic and a further example of the Matthew Effect seeking to reward science that looks like existing science rather than ground-breaking new topics, written by previously unknowns. Do AI-based discovery and evaluation tools have the potential to go the same way, predicting based on past performance, the more influential scholars of the future?

Summary

I don’t want to be a hand-wringing nay-sayer, like an old horse-and-cart driver declaring the automobile the end of all that is holy. But I’m not alone in my handwringing. Big AI developer, DeepMind, are taking this all very seriously. A key element of their work is around Ethics & Society including a pledge to use their technologies for good. They were one of the co-founders of the Partnership on AI initiative where those involved in developing AI have an open discussion forum, including members of the public, around the potential impacts of AI and how to ensure they have positive effects. The Edinburgh Futures Institute have identified Data & AI Ethics as a key concern and are running free short courses in Data Ethics, AI & Responsible Research & Innovation. There are also initiatives such as Explainable AIwhich recognise the need for humans to understand the process and outcomes of AI developments.

I’ve no doubt that AI can do enormous good in the world, and equally in the world of information discovery and evaluation. I feel we just need to have conversations now about how we want this to pan out, fully cognisant of how it might pan out if left unsupervised. It strikes me that we might do well to develop a community agreed voluntary Code of Practice for working with AI and citation data. This would ensure that we get to extract all the benefits from these new technologies without finding them being over-relied upon for inappropriate purposes. And whilst such services are still in their infancy I think it might be a good time to have this conversation. What do you think?

Acknowledgements

I am grateful to Rachel Miles, Josh Nicholson, David Pride for conversations and input to this piece, and especially thankful to Aaron Tay who indulged in a long and helpful exchange that made this a much better offering.

Elizabeth Gadd is the Research Policy Manager (Publications) at Loughborough University. She is the chair of the Lis-Bibliometrics Forum and co-Champions the ARMA Research Evaluation Special Interest Group. She also chairs the INORMS International Research Evaluation Working Group. 

 Unless it states other wise, the content of the Bibliomagician is licensed under a Creative Commons Attribution 4.0 International License. 

Dear REF, please may we have a SEP?

This blog post by Lizzie Gadd was first published on the WonkHE Blog on 2 July 2020.

Among all the recently research-related news, we now know that UK universities will be making their submissions to the Research Excellence Framework on 31 March 2021.

And a series of proposals are in place to mitigate against the worst effects of COVID-19 on research productivity. This has led to lots of huffing and puffing from research administrators about the additional burden and another round of ‘What’s the point?’ Tweets from exasperated academics. And it has led me to reflect dreamily again about alternatives to the REF and whether there could be a better way. Something that UKRI are already starting to think about.

Going Dutch

One of the research evaluation approaches I’ve often admired is that of the Dutch Standard Evaluation Protocol (SEP). So when I saw that the Dutch had published the next iteration of their national research evaluation guidance, I was eager to take a look. Are there lessons here for the UK research community?

I think so.

The first thing to say of course, is that unlike REF, the Dutch system is not linked to funding. This makes a huge difference. And the resulting freedom from feeling like one false move could plummet your institution into financial and reputational ruin is devoutly to be wished. There have been many claims – particularly at the advent of COVID-19 – that the REF should be abandoned and some kind of FTE-based or citation-based alternative used to distribute funds. Of course the argument was quickly made that REF is not just about gold, it’s about glory, and many other things besides. Now I’m no expert on research funding, and this piece is not primarily about that. But I can’t help thinking, what if REF WAS just about gold? What if it was just a functional mechanism for distributing research funds and the other purposes of REF (of which there are five) were dealt with in another way? It seems to me that this might be to everybody’s advantage.

And the immediate way the advantage would be felt perhaps, would be through a reduction in the volume and weight of guidance. The SEP is only 46 pages long (including appendices) and, perhaps with a nod to their general levity about the whole thing, is decorated with flowers and watering cans. The REF guidance on the other hand, runs to 260 pages. (124 pages for the Guidance on Submissions plus a further 108 pages for the Panel Criteria and Working methods and 28 pages for the Code of Practice – much of which cross-refers and overlaps).

And if that’s not enough to send research administrators into raptures, the SEP was published one year prior to the start of the assessment period. Compare this to the REF where the first iteration of the Guidance on Submissions was published five years into the assessment period, and where fortnightly guidance in the form of FAQs continues to be published, and where we are still yet to receive some of it months before the deadline.

Of course, I understand why the production of REF guidance is such an industry: it’s because they are enormously consultative, and they are enormously consultative because they want to get it right, and they want to get it right because there is a cash prize. And that, I guess, is my point.

But it’s not just the length of course, it’s the content. If you want to read more about the SEP, you can check out their guidance here. It won’t take you long – did I say it’s only 46 pages? But in a nutshell: SEP runs on a six-yearly cycle and seeks to evaluate research units in light of their own aims to show they are worthy of public funding and to help them do research better. It asks them to complete a self-evaluation that reflects on past performance as well as future strategy, supported by evidence of their choosing. An independent assessment committee then performs a site visit and has a conversation with the unit about their performance and plans, and provides recommendations. That’s it.

Measure by mission

The thing I love most about the new SEP is that whilst the ‘S’ used to stand for ‘Standard’, it now stands for ‘Strategy’. So unlike REF where everyone is held to the same standard (we are all expected to care 60% about our outputs, 15% about our research environment and 25% about real-world impact), the SEP seeks to assess units in accordance with their own research priorities and goals. It recognises that universities are unique and accepts that whilst we all love to benchmark, no two HEIs are truly comparable. All good research evaluation guidance begs evaluators to start with the mission and values of the entity under assessment. The SEP makes good on this.

And of course the benefit of mission-led evaluation is that it takes all the competition out of it. There are no university-level SEP League tables, for example, because they seem to have grasped that you can’t rank apples and pears. If we really prize a diverse ecosystem of higher education institutions, why on earth are we measuring them all with the same template?

Realistic units of assessment

In fact, I’m using the term ‘institutions’ but unlike the REF, the SEP at no time seeks to assess at institutional level. They seek only to assess research at the level that it is performed: the research unit. And the SEP rules are very clear that “the research unit should be known as an entity in its own right both within and outside of the institution, with its own clearly defined aims and strategy.”

So no more shoe-horning folks from across the university into units with other folks they’ve probably never even met, and attempting to create a good narrative about their joined-up contribution, simply because you want to avoid tipping an existing unit into the next Impact Case Study threshold. (You know what I’m talking about). These are meaningful units of assessment and the outcomes can be usefully applied to, and owned by, those units.

Evaluate with the evaluated

And ownership is so important when it comes to assessment. One of the big issues with the REF is that academics feel like the evaluation is done to them, rather than with them. They feel like the rules are made up a long way from their door, and then taken and wielded sledge-hammer-like by “the University”, AKA the poor sods in some professional service whose job it is to make the submission in order to keep the research lights on for the unsurprisingly ungrateful academic cohort. It doesn’t make for an easy relationship between research administrators and research practitioners.

Imagine then if we could say to academic staff, we’re not going to evaluate you any more, you’re going to evaluate yourselves. Here’s the guidance (only 46 pages – did I say?) off you go. Imagine the ownership you’d engender. Imagine the deep wells of intrinsic motivation you’d be drawing on. Indeed, motivational theory tells us that intrinsic motivation eats extrinsic motivation for breakfast. And that humans are only ever really motivated by three things: autonomy, belonging and competence. To my mind, the SEP taps into them all:

  • Autonomy: you set your own goals, you choose your own indicators, and you self-assess. Yes, there’s some guidance, but it’s a framework and not a straight-jacket and if you want to go off-piste, go right ahead. Yes, you’ll need to answer for your choices, but they are still your choices.
  • Belonging: the research unit being assessed is the one to which you truly belong. You want it to do well because you are a part of this group. Its success and its future is your success and your future.
  • Competence: You are the expert on you and we trust that you’re competent enough to assess your own performance, to choose your own reviewers, and to act on the outcomes.

The truth will set you free

One of the great benefits of being able to discuss your progress and plans in private, face-to-face, with a group of independent experts that you have a hand in choosing, is that you can be honest. Indeed, Sweden’s Sigridur Beck from Gothenburg University confirmed this when talking about their institution-led research assessment at a recent E-ARMA webinar. She accepted that getting buy-in from academics was a challenge when there was nothing to win, but that they were far more likely to be honest about their weaknesses when there was nothing to lose. And of course, with the SEP you have to come literally face-to-face with your assessors (and they can choose to interview whoever they like) so there really is nowhere to hide.

The problem with REF is that so much is at stake it forces institutions to put their best face on, to create environment and impact narratives that may or may not reflect reality. It doesn’t engender cold, hard, critical self-assessment which is the basis for all growth. With REF you have to spin it to win it. And it’s not just institutions that feel this way. I’ve lost count of the number of times I’ve heard it said that REF UoA panels are unlikely to score too harshly as it will ultimately reflect badly on the state of their discipline. This concerns me. Papering over the cracks is surely never a good building technique?

Formative not summative

Of course the biggest win from a SEP-style process rather than a REF-style one is that you end up with a forward-looking report and not a backward-looking score. It’s often struck me as ironic that the REF prides itself on being “a process of expert review” but actually leaves institutions with nothing more than a spreadsheet full of numbers and about three lines of written commentary. Peer review in, scores out. And whilst scores might motivate improvement, they give the assessed absolutely zero guidance as to how to make that improvement. It’s summative, not formative.

The SEP feels truer to itself: expert peer review in, expert peer review out. And not only that but “The result of the assessment must be a text that outlines in clear language and in a robust manner the reflections of the committee both on positive issues and – very distinctly, yet constructively – on weaknesses” with “sharp, discerning texts and clear arguments”. Bliss.

Proof of the pudding

I could go on about the way the SEP insists on having ECRs and PhD students on the assessment committee; and about the way units have to state how they’re addressing important policy areas like academic culture and open research; and the fact that viability is one of the three main pillars of their approach. But you’ll just have to read the 46-page guidance.

The proof of the pudding, of course, is in the eating. So how is this loosey-goosey, touchy feely approach to research evaluation actually serving our laid-back low-country neighbours?

Pretty well actually.

The efficiency of research funding in the Netherlands is top drawer. And whichever way you cut the citation data, the Netherlands significantly outperforms the UK. According to SciVal, research authored by those in the Netherlands (2017-2019) achieved a Field Weighted Citation Impact of 1.76 (where 1 is world average). The UK comes in at 1.55. And as far as I can see, the only countries that can hold a candle to them are Denmark, Sweden and Switzerland – none of which have a national research assessment system.

It seems to me that we have so much to gain from adopting a SEP-style approach to research evaluation. In a post-COVID-19 world there is going to be little point looking back at this time in our research lives and expecting it to compare in any way with what’s gone before. It’s time to pay a lot less attention to judging our historical performance, and start thinking creatively about how we position ourselves for future performance.

We need to stop locking our experts up in dimly lit rooms scoring documentation. We need to get them out into our universities to meet with our people, to engage with our challenges, to breathe our research air, and to collectively help us all to be the best that we can be – whatever ’best’ may look like for us. I believe that this sort of approach would not only dramatically reduce the burden (I’m not sure if I said, but the SEP is only 46 pages long), but it would significantly increase buy-in and result in properly context-sensitive evaluations and clear road-maps for ever-stronger research-led institutions in the future.

Frankly, I don’t want to come out of REF 2027 with another bloody spreadsheet, I want us to come out energised having engaged with the best in our fields, and positioned for the next six years of world-changing research activity.

The purpose of publications in a pandemic – and beyond

This blog post by Lizzie Gadd first appeared on the WonkHE blog on 22 April 2020.

There’s nothing like a crisis to make you realise what’s important and this couldn’t be truer than in the world of scholarly communication.

As researchers have rushed to investigate our way out of the current pandemic we’ve see journal content opened uppublication speeded up and systematic reviews ramped up. And we’ve seen research evaluation mothballed.

What’s going on?

UKRI is investing in £20M in novel coronavirus research whilst REF is on hold. The Wellcome Trust are converting their offices into respite care for NHS staff not making announcements about their new responsible metrics guidance.

The virus is reminding us that the purpose of scholarly communication is not to allocate credit for career advancement, and neither is it to keep publishers afloat. Scholarly communication is about, well, scholars communicating with each other, to share insights for the benefit of humanity. And whilst we’ve heard all this before, in a time of crisis we realise afresh that this isn’t just rhetoric, this is reality.

I recently attended an excellent OASPA webinar and heard SPARC’s Heather Joseph describe how they had to negotiate permission to put COVID19 articles on CORD19 (an OA database of COVID19) content, and even then, only the COVID papers they’d got permission for were available, not the network of references that those papers cited.

I must confess that I had a little sob.

What publications are for

I’ve been working in open access for over 20 years. My first job involved seeking copyright permission to digitise journal articles for academics to use – often their own papers – in their own teaching. We advocated open access as a solution to this problem. Twenty years on, we’re still advocating it, and I’m reserving the right to feel a little bit guilty, and more than a little angry and frustrated.

It seems to me that for twenty years any efforts to advocate for open access to research have been stifled by what I call the two big “buts”:

  1. But what about publishers and scholarly societies? How do we ensure they survive and that the economy isn’t damaged? (Subtext: publications are for profit)
  2. But what about academic careers? A good publication list is critical for promotion and tenure. (Subtext: publications are for credit).

When I explained the first “but” to my partner, and how many open access policies sought to shore up the publishing industry, he said it sounded like something straight out of the eighteenth century slave trade debates. The fact that profits will be affected by doing the right thing, doesn’t mean you shouldn’t do the right thing. Right?

And I confess to becoming increasingly less sympathetic to those touting the second argument. It reminds me of that video where an Italian mayor screams at his constituents for using mobile hairdressers during the lockdown, “What are you doing?!”, he yells. “Do you want to look good in your coffin? Don’t you know the coffin will be closed?!” If we’ve created a generation of scholars who are just in it for the glory of papers in glamorous journals, and not to do good research that changes the world a little bit, then we really are in trouble.

Because the pursuit of glamour could be killing us.

The cost of publication

UKRI have been throwing money (about £150 million by my estimate) at funding eye-watering Article Processing Charges with haute couture journals for seven years. This is money that could have been used for actual life-saving research. However, by trying to balance a preference for immediate OA to the version of record with a desire not to impinge on academics’ freedom-to-publish-where-they-want (not to be confused with actual academic freedom) they found themselves paying increasingly hefty APCsfor publications in journals that were more about distinction than dissemination. And when it comes to a global emergency, we’re still having to beg publishers for access to our own research so that we might save large swathes of the human race from an unnecessary death.

This is why I see the UKRI OA policy consultation as such an important opportunity. And I’m hopeful that world events, whilst tragic and terrible, may bring into sharp relief the true value and purpose of scholarly communication. Because I fear that despite all its talk of transformation, the only thing the proposed UKRI OA policy is currently set to transform is publisher profit margins. Yes, inspired by Plan S, it seeks immediate OA to all research output – which is great. But the lack of journals with zero embargo Green OA policies may mean that publication in pricey APC-based Gold OA journals is the only option for researchers. Thus publisher profit margins continue to be maintained whilst enabling self-destructive credit-seeking publication behaviours.

I say profit and credit can no longer be policy drivers. They just can’t.

The next platform

And it strikes me that when you take income generation and evaluation out of the equation, there is an obvious solution to this problem: UKRI need to set up a funder-based publishing platform and say to recipients, if you want our money, publish your findings here. End of.

This is not a new idea, but a proven technology. Gates Open Research is a great example. Despite not being the only publication option for Gates Foundation recipients, the papers published in Gates Open Research are achieving a cites-per-paper rate on a par with the world average for medicine. Similarly, Wellcome Open Research is on a par with top quartile journals in both biochemistry and medicine in terms of it’s Scimago Journal Rank.

We can’t just keep on writing open access policies in the hope that publishers will adapt their policies to accommodate them. No. If you want something doing, do it yourself. Or at least, you develop the specification, and invite publishers to bid to provide their services in accordance with that specification.

Seriously, what’s not to like? It’s not my purpose to expand on all the features of a publishing platform, but I can’t resist the following highlights:

  1. Preprints are available for immediate review and consumption by all (we know that in many disciplines peer review doesn’t materially change the output so this is a must – and it shaves years off publication times).
  2. Post-publication peer review reports give reviewers credit.
  3. Approved outputs can be indexed in bibliographic databases and made just as discoverable as journal-based outputs.
  4. All outputs can be made available under a CC-BY licence and in accordance with all the necessary technical requirements for truly findable, accessible, interoperable and reproducible research.
  5. Links to datasets and other open research outputs can be added.
  6. UKRI gets to trace, cradle-to-grave, the impacts of their funded research as it’s all published in one place.
  7. Publishing on the platform doesn’t depend on the wealth of the organisation and the size of their Gold OA Fund.
  8. It reduces researcher anxiety about ‘getting stuff published’. If it’s funded by UKRI it is welcome on the platform.
  9. Quality is assessed through peer review reports, and impact through subsequent usage (citations if that floats your boat), and not by journal brand.

So please UKRI, when you come to make your difficult policy decisions about open access, please put front and centre at every stage a very simple question: “Will this help scholars communicate more effectively and do better research?”. Everything else is a distraction. Progress has been impeded by two buts for twenty years. It’s time to focus.

No buts.

Goodbye journal metrics, hello openness? Investigating Plan-S readiness.

This post first appeared on The Bibliomagician blog on 17 April 2020.

Plan-S based funder Open Access (OA) policies claim that they are process-agnostic, with Green and Gold OA both meeting their requirements, but what proportion of your University’s current publishing outlets are Plan-S compliant via the Green OA route and how easy might the transition to immediate open access be? Lizzie Gadd reports on an investigation at Loughborough University. 

At Loughborough University we encourage academics to follow a ‘readership, rigour and reach’ approach to choosing where to publish. And to help colleagues assess the reach of an outlet, we may suggest the use of field-normalised journal citation metrics as an indicator of its visibility. But, as an institution with an excellent track record in engaging with Open Access, and a newly minted Open Research Position Statement, we know that openness increases visibility too.  We know that highly-cited journals are only highly-cited because academics have historically submitted their best work there and we are keen to encourage colleagues to think more broadly about routes to visibility.   

Of course, we’re also aware that the external environment is changing and soon the UKRI may be adopting a Plan S-based Open Access (OA) policy which requires the researchers they fund to ensure that the work they produce is made available immediately on publication. This could be through a pure Gold OA journal, a hybrid journal that is ‘transitioning’ to pure, or via Green OA.  At Loughborough, like many medium-sized, less wealthy but research intensive institutions, we have historically embraced the Green route to OA.  Indeed, recent work by the Curtin Open Knowledge Institute using Unpaywall discovered that Loughborough University is 4th in the world in terms of the proportion of our outputs that are available as Green OA.  So, to help us not only guide our academics towards a broader interpretation of visibility, but also to assess our readiness for Plan S, we thought we’d take a look at what proportion of the outlets we currently publish in are not only ’highly cited’ in terms of journal citation metrics[1], but ‘highly open’ in terms of having a zero embargo Green OA policy. 

One thing we didn’t check as part of this analysis was whether those journals offering zero embargo Green OA policies also allowed papers to be made available under a CC-BY licence as preferred by Plan S, and as required by the proposed UKRI OA policy.  This is simply because it is so blooming difficult to get hold of this information.  The obvious place to store it is SHERPA/RoMEO and in some cases that’s where you’ll find it, but coverage is currently very patchy. 

“Currently, just over one-third of our most frequently used sources (35%) would be Plan S compliant, assuming their licences were acceptable.”

So, we downloaded from SciVal the top 100 sources[2] published in by each of our Schools (or disciplines where a School is multidisciplinary) between 2016 and 2018.  We then identified the ten sources in which our authors published most frequently. In some cases, due to differing disciplinary approaches to publishing there were fewer than ten sources in which more than one Loughborough output appeared.  In total 146 sources were identified, and these were checked for citedness (whether they appeared in the top 10% of sources by SNIP or SJR[3]) and for openness (involving a SHERPA/RoMEO search for the length of their embargo period or ‘pure’ Gold status)[4].  

In total, we found that 44% (64) of our frequently used sources were in the top 10% citation percentiles by SJR or SNIP. We also found that 30% (44) had a zero embargo green OA policy as listed on SHERPA/RoMEO and a further 5% (seven) were Gold OA journals.  This would mean that, currently, just over one-third of our most frequently used sources (35%) would be Plan S compliant, assuming their licences were acceptable.  

FIGURE 1: Outlet visibility at Loughborough University

So that was kind of interesting. But, of course, whilst we’re transitioning to new measures of journal visibility, academics will ideally want to focus on sources that are both highly cited and highly open/Plan S-compliant.  So what were their options for hitting both of these indicators? Unfortunately, not so great. Only 22 outlets – just 15% of our most-published-in sources – were both highly cited and highly open, with one additional outlet hitting the ‘highly open’ target by virtue of us paying for the privilege (APC-based Gold OA).  

When we shared this with academics their perhaps inevitable next question was, well what highly cited and highly open options do we have across the wider list of 100 sources (i.e. not just those we’re publishing in the most)?  Surely if we widened the net we’d find much greater opportunity to grasp ‘mega-visible’ publishing opportunities? Alas, it was not to be. Having extended the (very time-consuming) exercise to check all their highly cited sources for open access options we found that a much smaller proportion overall, a mere 7%, hit both indicators.  This varied of course from discipline to discipline with the greatest opportunity being afforded to communications (16%) and the lowest to education (0)[5].  And of course, this is before we factor in whether those zero embargo Green OA titles actually allow manuscripts to be posted under CC-BY licences.

Oh dear.

Now I’m aware we have a sample of one university here. And our publication practices may or may not be representative of the wider population.  Indeed, it would be great if others could run this analysis at their institutions to see how widespread this phenomenon is. But whether or not it’s replicated elsewhere, this is the reality for us.  

The low-hanging fruit of course, is to draw attention to those 54 titles (37%) that hit neither visibility indicator.  And by broadening our ‘definition’ of visibility, we can highlight a wider range of titles that can serve this important end.  However, if we were hoping to find a good list of titles in which academics were currently publishing that were both highly cited and highly open, we were pretty disappointed.  On average, each of our disciplines had five sources to choose from that were both highly open and highly cited.  In reality, some had none at all. So what do we do with that?

“Having extended the exercise to check all their highly cited sources for open access options we found that a mere 7%, hit both indicators.“

The truth is that although we use citedness and openness as visibility indicators, they do both indicate different aspects of visibility. Openness speaks of potential reach and, if other open research practices have been engaged with, perhaps increased rigour.  Citedness speaks of actual reach, of journals that have a track record of finding and influencing their target audience and, because they attract so many papers and have built up stringent peer review processes to weed out the poorer ones, they may also claim increased rigour. So, again, what to do? 

I think that all too often we research support folk can hide behind our general principles and our generic advice: “We support openness”. “Consider open access options in your publication choices.”  But if an academic collars you and asks explicitly whether they should choose Journal A that is highly cited and closed, or Journal B that is poorly cited but open, and assuming the readership and rigour of both are comparable, we find ourselves in an extremely tricky spot, caught between conscience and convention.

And that, my friends, is why to meet the demands of Plan S (and the UKRI OA policy) I fear we are going to have to abandon Green OA in favour of pricey “publish & read” or “read & publish” big deals with the publishers of existing highly cited journals. There simply aren’t the zero embargo Green OA deals around for the sources in which we publish the most. And again I iterate, this is before we’ve factored in the potential CC-BY requirement. I think it’s unlikely that publishers, given the choice between making their Green OA policy zero-embargo-with-CC-BY or receiving additional ’gold-for-Gold’, are going to opt for the former.

It might be helpful to policymakers such as the UKRI to understand how widespread this experience is, so if anyone fancies running this analysis on their own HEI, I’ve provided my method below. Similarly, if you felt able to share your data when you’re done, I’d love to hear from you.

Huge thanks to Dr Karen Rowlett for her comments on an early draft of this piece. 

[1] As I say in the opening paragraph, there is really no such thing as a highly-cited journal, only journals to which academics submit their best work, that ends up lending its citedness to that journal. However, I use the term ‘highly cited’ as a short cut. Don’t judge me.

[2] SciVal only allows you to extract 100 sources per entity.

[3] Each School and discipline at Loughborough gets to select (or not to select) their own field-normalised journal metric and threshold.

[4] If a source was not listed on SHERPA/RoMEO it was recorded as being non-compliant as we didn’t have the resource to chase down every title individually. We also did not factor in any existing publisher deals that allow Loughborough academics to publish ‘APC-free’. This in reality, the percentage of Plan S-compliant titles might be a bit higher than this.

[5] Excluding politics, history and social work where SciVal’s coverage of our titles is too low to be meaningful.

Method 

(I used SciVal but you could also use a bibliographic database such as Dimensions, Scopus or Web of Science)

  • In SciVal – Overview – 2016-18 – Published – By Scopus Source – Export the data one School/department/discipline (hereafter, research unit) at a time into Excel
  • For each research unit, highlight the ten outlets in which they publish the most
  • Highlight (using number filters) which of those ten titles appear in the top 10% by the journal citation metric of their choice. (We use data provided by SciVal to help us with this. We use 1.5 as the overall threshold for top 10% SNIP and 1.4 for top 10% SJR. You could use discipline-specific thresholds if you preferred.)
  • Search for each title on SHERPA/RoMEO and check whether it:
    • Allows self-archiving of the Accepted Version immediately on an Institutional Repository. 
    • Is a pure Gold OA Journal. On SHERPA/RoMEO this is indicated by ‘Listed in DOAJ? Yes’
  • You may also wish to note the number of titles not listed on SHERPA/RoMEO.
  • Record the total and percentage of outlets that:
    • Appear in the top 10% by journal citation metric
    • Have a zero embargo Green OA policy
    • Are pure Gold OA journals
    • Are both top 10% and zero embargo OR are both top 10% and Gold OA journals.
  • You may also do this for your whole institution if you are unable to disaggregate by field.  To do this:
    • Download a list of sources published in by your HEI between 2016-18
    • Extract their journal metrics using the Scopus source list download.
    • Filter on those in the top 10% (use thresholds provided above).
    • For those top 10% sources, check SHERPA/RoMEO for their Green/Gold OA status as above.
    • Record the total and percentage of outputs that:
      • Appear in the top 10% by journal citation metric
      • Have a zero embargo Green OA policy
      • Are pure Gold OA journals
      • Are both top 10% and zero embargo OR are both top 10% and Gold OA journals.

CRediT Check – Should we welcome tools to differentiate the contributions made to academic papers?

This post by Lizzie Gadd was originally published on the LSE Impact Blog on 20 January 2020.

Elsevier is the latest in a lengthening list of publishers to announce their adoption for 1,200 journals of the CASRAI Contributor Role Taxonomy (CRediT). Authors of papers in these journals will be required to define their contributions in relation to a predefined taxonomy of 14 roles. In this post, Elizabeth Gadd weighs the pros and cons of defining contributorship in a more prescriptive fashion and asks whether there is a risk of incentivising new kinds of competitive behaviour and forms of evaluation that doesn’t benefit researchers.

Getting named on a journal article is the ultimate prize for an aspiring academic. Not only do they get the paper on their CV (which can literally be money in the bank), but once named, all the subsequent citations accrue to each co-author equally, no matter what their contribution.

Original tweet by Ali Chamkha, retweeted with comment by Damien Debecker. 3 January 2020

However, as this tweet demonstrates, getting named on a journal article is not the same as having a) done the lion’s share of the research and/or b) actually writing the journal article. And there is a lot of frustration about false credit claims. Gift authorshipghost authorshippurchased authorship, and wrangles about author order abound. To address these problems there is some helpful guidance from organisations like the International Council of Medical Journal Editors, the British Sociological Association and the Committee on Publication Ethics (COPE) about what constitutes an author. Perhaps most significantly, in 2014 we saw the launch of CASRAI’s Contributor Role Taxonomy, CRediT.

CRediT aims to ensure that everyone attributed on a paper gets recognised for their contribution. As such it goes one step further than guidance and provides a structured way for authors to declare their various contributions. It lists 14 contributor roles, some of which you might expect (writing, analysis) and some of which you might not (supplying study resources and project admin). And whilst it won’t stop someone being named who should not be named, nor will it ensure that everyone is named who should be named, it does make omissions a bit more difficult – and for this it has been highly praised.

But, I still have some questions about CRediT. And whilst I might be overthinking this (bad habit), I’d welcome any thoughts the community might have on the following:

  1. Are there important differences between authors and contributors that we need to retain and how does CRediT support these?
  2. Is a focus on credit-seeking what the community needs, or will this end up embedding the status quo around problematic output-based evaluation?
  3. Are we going to end up with new forms of CRediT-based evaluation that might have negative systemic effects?

Authors vs contributors

So, the ‘C’ in CRediT stands for Contributor. It is a Contributor Role Taxonomy. But what is not too clear is whether CRediT seeks to capture contributions to the paper, or contributions to the research. It might sound like I’m being picky, but in legal terms there is a big difference between these two. Because, someone who writes the paper is technically an author and has rights as such and someone who only contributes to the underlying research is not. So, whilst an author is always a contributor, a contributor is not always an author.

Why is this important? Well, because corresponding authors are usually responsible for the legal transfer or assignment of rights to the publisher prior to publication. And if that corresponding author is actually just a contributor (and CRediT starts making this explicit when it wasn’t previously), then technically they can‘t transfer those rights to the publisher because they don’t own any. This is particularly important because corresponding authors are often senior researchers or principal investigators who are less likely to be paper writers.

But it’s not just for legal reasons that these labels matter. As the tweet shows, they matter to researchers too. Researchers have a sense that the term ‘author’ means something different, more significant, than ’contributor’. Disciplinary norms play a huge role here of course. In the medical sciences, the ICMJE have actually spelled out which roles constitute ‘authorship’ and which constitute ‘non-author contributorship’. They even specify that non-author contributors should only be ‘acknowledged’ and not listed in the author by-line.

In the Arts & Humanities, naming a single author a ‘contributor’ would seem entirely inappropriate as it suggests that others had a hand in their work. However, I wonder whether single-authors, if called upon to strictly adhere to CRediT, would find themselves obliged to list others as ‘contributors’ (Librarians maybe?) where historically in their disciplines they might not do so.

Assuming that CRediT are not seeking to abolish the role of author altogether and assuming they don’t believe non-author-contributors should be relegated to the acknowledgements, where presumably they’d get no formal credit at all, I’m not entirely sure where this leaves us. Are they creating a third category of research participant, slightly more than ‘acknowledgee’, but less than author?  And assuming such a status could easily be incorporated into the world’s bibliographies, can someone’s contribution be assessed merely on the role name (e.g., ‘Software’) or would it need to be assessed on the level of their contribution in that role?

My final question is whether all disciplines are happy that the 14 roles identified by CRediT are the right ones? Now I’m aware that CRediT was not initially designed with Arts & Humanities subjects in mind, so this might not be an entirely fair question. But I must say it surprised me recently, when a materials scientist, Dr Ben Britton, spoke of his frustration at having to adhere to CRediT because he felt the 14 roles weren’t pertinent to his discipline. I’m left wondering whether it was too ambitious to believe the many and varied contributions to the many and varied scholarly sub-disciplines could be distilled into 14 categories. And whether it is unrealistic to think that the same 14 categories are going to remain the same forever.

Credit-seeking

My second niggle with CRediT is its name. I am partial to a cheeky acronym myself, and I can only imagine the glee its creator must have felt when they came up with this one. After all, who doesn’t want credit for their contribution?!  But scholarship is not all about getting credit, believe it or not. There is something about taking responsibility too. And, as we’ve seen above, about taking copyright ownership of their work.

There are a lot of problems in the scholarly communications space caused by credit-seeking behaviours. For instance, publishing only headline-grabbing results, not publishing null results, publishing too hastily with subsequent retractions, and irreproducible science. We know that if more researchers had a stronger sense of the copyright ownership that authorship conferred, and felt less driven to relinquish their rights to publishers in exchange for reputational credit offered by publication, we wouldn’t find ourselves in a situation where the majority of our scholarly output is owned by commercial entities.

Indeed, one of the problems CRediT itself is seeking to address is unfair credit-seeking.  So ironically, I wonder whether CRediT is unwittingly contributing to the problem it seeks to solve.

Evaluation by CRediT

Of course, our interminable obsession with publication-based credit is inevitably going to lead some to make use of CRediT for ostensibly fairer research evaluations. We know that getting a citation to a paper on which you were the 1,000th author cannot mean the same thing as getting a citation to a paper on which you were a single-author. Clarivate have recently argued that with the increase in hyper-authored papers, the fractionalisation of citations should become the norm. Makes sense. How natural then, to start weighting citations based on the actual role you played on a paper?

We are already seeing bibliometric analyses based on contributor roles. Whilst this is interesting at a ‘science of science’ level (e.g., are roles gender based?), it worries me on an individual researcher evaluation level.  Are we going to see some roles prized above others? Will some roles literally ‘count’ and some roles not?  And what impact will this have on those early career researchers in project administration and literature searching roles that CRediT seeks to give previously unacknowledged credit to?  Will they, in another terrible fit of irony, be excluded from some forms of credit altogether?

I’m not sure there is any way of mitigating against the worst effects of this. And I’m particularly concerned because of course the underlying CRediT data required to run such analyses will be collected and owned by publishers. I note from the CASRAI website that they are seeking to “ensure that CRediT is tied to ORCID and included in the Crossref metadata capture”.  But not all metadata ingested by Crossref is available openly. And the world’s largest journal publisher, which recently announced the adoption of CRediT by 1200 of their journals, infamously does not cooperate with open citation services.

To me this is a concern. I’m not sure if CASRAI has any power to ensure that CRediT-adopting journal publishers commit to making their resulting CRediT data available openly via Crossref, but if so I would urge them to do this.  At least this way publishers won’t end up with exclusive control over the community’s CRediT data.

Summary

I don’t want to rain on CRediT’s parade because the problem they seek to address is a real one. And the efforts they’ve made have had a considerable impact. However, I fear there are some challenges with CRediT’s current trajectory which may mean that those they hope to provide greater visibility for actually receive less credit rather than more. There are no easy answers here of course, but I worry that without an open conversation about some of these issues, CRediT might lose some of its very considerable potential. What do you think?

Acknowledgements

Huge thanks to Professor Charles Oppenheim and Dr Simon Kerridge for reviewing an early draft of this piece. In CRediT terms: “Writing – review & editing”. Definitely contributors not authors I’m thinking…

Wrong question?

This post was originally published on The Bibliomagician on 16 January 2020.

Lizzie Gadd argues that good research evaluation starts with good questions.

The 2019 Ig Nobel Prize winners were announced in September.  Among my favourites was some research into the pleasurability of scratching an itch. Contrary to what the name suggests, winning an Ig Nobel prize is not an indictment of your research design or methods. It’s an indictment of your research question. Is the question a good one? Is it an important one? Or are you even starting with a research question at all?

It strikes me that in research evaluation we’re also in danger of this. Often, it’s not our metrics that are at fault, or even our methods, it’s our questions.

Image credit: critical thinking asylum and jppiCC BY-NC-ND

The INORMS Research Evaluation Working Group recently developed a process called SCOPE by which research evaluation can be done responsibly. The S of which stood for ‘START with what you value’.  In other words, start with the right questions, the things you want to know, and work forwards. Don’t start with what others value, or with your dataset, and work backwards.  

So, a discussion list member recently reported that they’d been asked to come up with a single research indicator for their university’s KPIs. Just the one. ‘Research indicator’.  What does that mean? What do they want to indicate about research? What are they trying to achieve? There are a whole lot of pre-questions that need answering before we can start to answer this one.

No-one questioned whether arts & humanities colleagues actually valued citations enough to want to be measured by them.

Again on another discussion list, a colleague was seeking a bibliographic data source which provided better coverage of arts and humanities content for bibliometric analysis. Lots of helpful folks pitched in with responses. But no-one questioned whether arts & humanities colleagues actually valued citations enough to want to be measured by them, or whether there were better ways of assessing the quality and visibility of their outputs. Wrong question.

But it’s not just asking the wrong question that we’re prone to, it’s not starting with a question at all and retrofitting one once you have your data.  In science this has become known as ‘HARKing’ – Hypothesising After the Results are Known.  And I’ve seen two cases recently where it feels like this is happening.  

So, Elsevier’s newly formed International Centre for the Study of Research recently produced a report demonstrating the increase in fractional authorships resulting from increased collaboration. Fine. Except it was advertised as a study purporting to answer the question “Are authors collaborating more in response to the pressure to publish?”.  Now this is a good question, but you can’t answer this question with the data they have. If you want the answer to THIS question, you’d have to ask authors what their motivations for collaborating were. The study doesn’t do that.  Wrong question.

In the interests of balance, a collaboration between Digital Science and CWTS Leiden recently produced a research landscape mapping tool.  It’s a very interesting visualisation of the research publications resulting from the big funding agencies. Great. However, it was promoted as a tool to “support research funders in setting priorities”.  Now, knowing the brilliance of CWTS Leiden, I struggle to believe that they started with the question ‘How can we better support funders to make funding decisions?”, and ended up with a tool that showed what publications had resulted from historical funding decisions. And the thought that this data should be used in any significant way to support funding decisions concerns me.  What seems more likely to have happened is that a fabulous dataset and a fabulous visualisation tool got together and had a fabulous research mapping tool baby, and then, to justify the accident of its birth, cursed it with a grand title it could never quite live up to.

Asking the right question is important. And we need to articulate our question before we decide how best to answer it. And when we answer it, we need to offer our findings in light of the question we asked, not one we think might give the better headline.

Ultimately we need to look beyond the questions we are asking to the systemic effects of asking them. Let’s take the h-index as an example. My h-index is currently 13. And if my h-index of 13 is the answer, what is the question exactly?

There’s a bigger question as to whether rewarding only publication-producing individuals is good for scholarship? Good for our universities? Good for humanity?

So, technically, I guess the question is, “How many publications do you have with at least that number of citations?” But the implied question is “How prolific are you at producing well-cited publications?”  However, you can’t answer that question without knowing my career stage and discipline and how many career breaks I’ve had and what role I played on those publications and how many co-authors they’ve had. So the h-index alone doesn’t provide an answer.  But even if it did, the systemic effects of asking this question are significant. Do we want to employ or promote individuals who are only prolific at producing well-cited publications? The answer might be yes, because that’s how we as universities are measured. But then there’s a bigger question as to whether rewarding only publication-producing individuals is good for scholarship? Good for our universities? Good for humanity? Is this actually fulfilling our mission as universities? Even if it’s making our universities look good in the eyes of the rankings or some other well-meaning but misguided party.

Aaron Swartz once said, “what is the most important thing you could be working on right now? And if you’re not working on that, why aren’t you?”. It seems to me that the biggest problems in the research evaluation space could be solved not by better methods and metrics, but by better questions. Are they important? Are they honest? Are they mission-driven? Once we have our questions right – once we have put our values back into our evaluations – we will be well on the road to more responsible research assessment.

Open access to teaching material – how far have we come?

This post by Lizzie Gadd, Jane Secker and Chris Morrison was originally published on the LSE Impact Blog on 16 September 2019.

One of the foundational aims of the open access movement, set out in the Budapest Open Access Initiative, was to provide access to research not only to scholars, but to “teachers, students and other curious minds” and in so doing “enrich education”. However almost two decades on from the declaration access to the research literature for educational purposes remains limited. In this post Elizabeth Gadd, Jane Secker and Chris Morrison present their research into the volume of open access material available for educational purposes, finding that although much research is now available to read, a significant proportion is not licensed in a way that allows its use for teaching.

One of the hoped-for benefits of open access was not only to enable researchers to access a wider range of content, but to enable learners to access it too.  Many early electronic library projects to provide learning resources struggled, not for technical reasons, but for copyright ones: permission to re-use the required content had to be negotiated on a case-by-case basis. So approximately two decades on from the birth of open access, how far have we come along this road?  What proportion of content required to support university teaching is actually available open access?  And not only available, but properly licensed with explicit permission to support teaching efforts? Are Libraries and other support units exploiting open access opportunities to provide content to students? And if not, why not?  These were some of the questions asked by a study supported by SCONUL, Jisc, RLUK and UUK this year.

The research team took as its starting point the submissions that UK Universities are obliged to make annually to the Copyright Licensing Agency as to the digital copies they make under the CLA Higher Education Licence to support courses of study.  This data is typically collected by university libraries, many of whom have set up centralised scanning services to digitise the material and ensure adherence to the terms of the CLA Licence. This is not, of course, all the copying they do to support teaching – especially of journal articles which are often available under e-journal licences.  However, it is a good indication of the range of materials used to support teaching, which differ in many respects to the content used to support research.

Overall, we found that 18% of the items copied were journal articles and 82% books.  So, taking a random stratified sample of the journal content, and searching for openly accessible copies using Unpaywall, Open Access Button and Google Scholar, we found that 38% were available (either legally or illegally) in some form. An encouraging figure. And approximately 30% of those discovered by Scholar were available in more than one location – so under the ‘Lots Of Copies Keeps Stuff Safe’ (LOCKSS) principle this might give Librarians reassurance that should one copy disappear another should still be available. So far so good.

However, of the sample, only 7% came with an easily locatable re-use licence, and just 3% explicitly allowed use in an electronic ‘course-pack’. Oh dear. This is where the important differences between gratis and libre Open Access, between available and re-usable, between CC BY and CC BY-ND, come to the fore. Most researchers don’t really care about these distinctions, as long as they can find what they need and read it on the day they find it.  However, for librarians wanting to ensure permanent, legitimate access to the final version of record for students, these things matter.  And recent EU case law finding certain types of unauthorised linking to be an infringement of copyright won’t help to reassure them.

It is perhaps for these reasons that none of the librarians we interviewed incorporated OA searches into their acquisitions processes.  Most put it down to a lack of evidence that it would be worthwhile, and that legal copies could be found. This is such a shame, especially when we found that 89% of content was written by academics, and 58% had been written since 2000 and thus could arguably have been made openly available under a suitable re-use licence if they had had the appropriate support to do so.

We need to do better at this.  At both open access and appropriate licensing. We also need to do better as librarians at beginning to look seriously at using in teaching the OA content that many have sweated blood and tears to provide.  Our study also found that Unpaywall and Open Access Button were both pretty much on a par when it came to finding copies with each locating about 30%, (although Unpaywall had fewer false positives). But Google Scholar found the other 70%, many of which were ‘legal’ OA copies (either Gold copies or Green copies on Institutional Repositories) that neither of these services found. So unfortunately there is no one-stop-shop yet for locating legal OA copies.

This finding, in conjunction with the complexities of the OA landscape (preprint vs postprint, legal vs illegal, Gold vs Green, permanent vs impermanent) makes librarians’ reluctance to engage understandable. However, with increasing pressures on library budgets and our work showing that over one-third of journal material is now available openly, the time must be right to start exploring this more seriously.  The obvious starting point would be the provision of guidance for librarians as to how they might go about locating, checking and making available such content to support teaching.  This should be a priority for the library Copyright and Information Literacy Community.

The authors are grateful to the Jisc, RLUK, SCONUL and UUK for funding this research. They are also grateful to Sharon Cocker, Ruth Mallalieu, Neil Sprunt, and Ralph Weedon for assisting with data collection and to the UUK Copyright Negotiating and Advisory Committee for their steer and guidance. The post draws on the author’s co-authored article, The Impact of Open Access on Teaching – How far have we come?, published in Publications.

About the authors

Elizabeth Gadd is the Research Policy Manager (Publications) at Loughborough University. She has a background in libraries and action research with a PhD in copyright and scholarly communication. She co-founded both the Lis-Copyseek and Lis-Bibliometrics Fora and co-champions the ARMA Research Evaluation Special Interest Group. She also chairs the International Network of Research Management Societies (INORMS) Research Evaluation Working Group and was the recipient of the 2020 INORMS Award for Excellence in Research Management Leadership.

Chris Morrison is the Copyright, Licensing and Policy Manager at the University of Kent, responsible for copyright policy, licences, training and advice. He was previously the Copyright Assurance Manager at the British Library and before that worked for music collecting society PRS for Music. He is a member of the Universities UK Copyright Negotiation and Advisory Committee on whose behalf he also attends the UK Government’s Copyright Education Awareness Group (CEAG). He is co-author of the second edition of Copyright and E-Learning: a guide for practitioners which was published in July 2016, and is also the originator of Copyright the Card Game. Chris recently completed a masters in copyright law at King’s College London and his dissertation explored the understanding and interpretation of Section 32 of the Copyright, Designs and Patents Act ‘Illustration for Instruction’ by UK universities. He tweets as @cbowiemorrison

Jane Secker is Senior Lecturer in Educational Development at City, University of London, where she teaches on the MA in Academic Practice. She is the former Copyright and Digital Literacy Advisor at LSE, where she coordinated digital literacy programmes for staff and students including copyright training and advice. She is Chair of the CILIP Information Literacy Group, a member of the Libraries and Archives Copyright Alliance and the Universities UK Copyright Negotiation and Advisory Committee, which negotiates licences for the higher education sector. She is widely published and author of four books, including Copyright and E-learning: a Guide for Practitioners, the second edition of which was co-authored with Chris Morrison and published in 2016 by Facet. She tweets @jsecker

Both Chris and Jane tweet as @UKCopyrightLit and maintain the Copyright Literacy website: https://copyrightliteracy.org