Book review: Breaking Ranks by Colin Diver

This post was originally published on the WonkHE blog on 24 August 2022

Over the last four years I’ve spent a lot of time trying to expose the weaknesses of the global university rankings as a reliable way of evaluating the contribution of higher education institutions and enabling comparisons between them.

It’s not a difficult job: the inadequacies of the rankings are visible to the naked eye. It’s not even that difficult to persuade senior leaders of this. They’re not idiots – they can spot a shonky methodology at 100 paces.

What proves to be difficult is to persuade senior leaders to actually act on this knowledge. To accept that yes, the rankings are statistically invalid and deeply unhelpful to all actors in the HE system, but to then actually take some kind of publicly critical stand.

But back in 1995 that is exactly what the former President of Reed College, Steven Koblik, famously did. And this book, written by Koblik’s successor, Colin Diver, tells us all about it.

You can see why I was excited to read it.

What happens when you refuse to be ranked?

So, Breaking Ranks: How the rankings industry rules higher education and what to do about it is the fascinating first-hand experience of Reed’s immediate past President having inherited an institution that decided “it would no longer be complicit in an enterprise it viewed as antithetical to its core values” and began withholding data from the rankings.

I have to admit I was a smidge disappointed, given the expansive and promising title, to learn in the opening chapters of the book that the “rankings industry” it refers to is almost exclusively the US News & World Report (USNWR) Ranking – and the “Best US Colleges” edition at that. And the higher education it refers to, is unapologetically US higher education. And consequentially, the what to do about it means what to do about the USNWR. However, there is clearly common ground between the USNWR ranking portfolio and other ranking agencies’ offerings, so those with more broad-ranging interests will still find it informative – whilst also receiving a detailed grounding in the USNWR.

Rankings and how we relate to them

The book begins by describing the evolution of rankings and their increasingly negative influence on the US HE sector. As a former Dean of the University of Pennsylvania Law School prior to his role at Reed, Diver has enough experience to have seen this rise first-hand, yielding many interesting and humorous anecdotes. He also unpacks and systematically shoots down some of the approaches used by the USNWR and other ‘rankocrats’.

Diver asks whether university rankings measure and perpetuate wealth. (Spoiler alert: yes they do). HE has been reduced to “a competition for prestige” he argues, with rankings becoming ”the primary signifier”. He proceeds to critique three proxies for prestige commonly used by rankings including an enjoyable rant about “unscientific straw polls”.

We then move on to investigate a particular feature of USNWR College ranking, the way it which “judges colleges by who gets in”. This may be of less interest to the international reader, however, the chapter detailing the games colleges play to win on these measures left me open-mouthed. And I’m not new to this lark.

The conclusion attempts to identify what HE institutions do offer the world which might actually be worth measuring and offers some advice for those seeking to loosen the rankings’ grip. More on that below.

As you might expect from a scholar of Diver’s standing, the book is well-structured, his arguments are well-built, and his writing style is very accessible. What you might not expect is his honesty. He is quite open about how in former roles he has himself engaged in less savoury ranking-climbing behaviours (e.g., hesitating to admit talented poor students for fear they would drag down the institution’s average LSAT score). He is also quite honest about the negative impacts of not engaging in such behaviours (watching others climb via the use of distasteful methods, whilst you fall back due to the lower scores the ranking gods bestow upon you if you don’t feed them fresh data).

For me, it is these lived experiences that form this book’s most original and valuable contribution. Many HE professional services staff are desperate to engage with their senior leaders on these issues; here is a senior leader who is desperate to do the same. Getting the opportunity to watch a university president think these matters through; to see the rationale that led to them taking a stand, and the impact that taking that stand had on their institution is gold-dust. And, I cannot tell a lie, it gives me hope.

A four step plan

His closing admonishments to both students and universities are simple: ignore the rankings if you can, but if you can’t, go to the opposite extreme and take them really seriously. That is, study them in so much detail that you truly understand the contribution they can make to answering any question you might have about the quality of a HE institution. Convinced this will dilute any confidence folk might have in the rankings, he even proposes an alternative decision-making approach based on allocating institutions points based on the decision-maker’s priorities. As a protagonist of starting with what you value when seeking to evaluate, I found myself nodding vigorously.

For institutions inspired by Reed’s anti-ranking stance, Diver offers a sensible four-stage withdrawal process. This consists of:

  • not filling out peer reputation surveys,
  • not publicising rankings you consider illegitimate,
  • celebrating rankings that truly reflect your values, and
  • giving everyone equal access to your data (rather than giving it to rankers for their exclusive use).

Hope for the rankings industry?

The big surprise for me, given the gritty exposé of the rankings’ methods and “grimpacts” contained in this book, was that Diver doesn’t conclude by dismissing the concept of rankings altogether. Despite the limitations inherent in all the indicators he dismantles, he seems to remain hopeful that an institution’s contribution to preparing students for a “genuinely fulfilling life” might one day be rigorously measured and that someone “will actually try to rank colleges accordingly”. This seems to me neither feasible nor desirable.

As I’ve said before, rankings are only necessary where there is scarcity: one job on offer, or one prize. For every other form of evaluation there are far more nuanced methods including profiles, data dashboards, and qualitative approaches. Indeed, as the Chair of the INORMS Research Evaluation Group which is itself currently developing an initiative through which institutions can describe the many and varied ways in which they are More Than Our Rank, I couldn’t help but think that Reed College would be the perfect exemplar for this.

Back in 2007, Reed’s actions kick-started something of a resistance movement amongst 62 other US colleges. (We’re now seeing something similar amongst Chinese universities in response to the global rankings). Unfortunately this movement didn’t really gain momentum, but perhaps as other initiatives arise such as More Than Our Rank, and the EUA/Science Europe Reforming Research Assessment Agreement which forbids the use of rankings in researcher assessment, now is the time, and this book is the touch-paper?

Breaking Ranks is published by Johns Hopkins University Press and is priced at $27.95.

Tales from REF Central: Reflections from REF results week

This blog was originally posted on the Higher Education Policy Institute (HEPI) Blog on 26 May 2022

I’ve spent my whole career in higher education, but this has been my first REF at the coalface. And so this month was my first in ‘REF Central’, our institutional hub of spreadsheet-wrangling and story-spinning, fuelled by a bottomless teapot and an excessive amount of cake. I’m not short of an opinion on the REF: its shortcomings and its possibilities. I even had a thing or two to say about REF results reporting. But nothing really prepared me for the rollercoaster of REF results day.

  1. Every number (doesn’t) tell(s) a story

As your results data starts coming through, first in raw spreadsheet form, then processed by your institutional planners, your eyes are hungrily darting across the numbers looking for the big ones. What percentage 4* did we get? Overall? In outputs? Impact? Environment? And how does that compare with the sector? 

And then you might start generating indicators: Grade Point Averages or Research Power ratings (GPA x FTE of submitted staff). You may even start throwing the numbers at a modeller to try and predict what this will mean financially – or at least what percentage of the overall 4* share you had relative to last time. 

But the more you churn the data, with every layer of aggregation and abstraction, the further you get from the humans beneath.

I found their faces appearing ghost-like in the spreadsheet cells. The new Head of Department who felt the weight of the REF so keenly: I feared it might break him. The Unit Lead who had to devise a robust way to fairly select 500 outputs from 5000. And the Associate Dean who spent the last few days prior to submission frantically seeking publisher confirmation that an output would have a 2020 publication date. 

Beyond our own staff lay all our impact case study beneficiaries: the lives immeasurably improved by research-informed social policies and technologies I could only Google.

And, of course, this was largely all delivered in pandemic-struck conditions to a backdrop of dog-barking, doorbells and Dora the Explorer. The laughter, the anger, the anxiety, the hope. The people. All reduced to hundreds of datapoints swimming on a page that frankly by the end of Day 2 I never wanted to see again. Unlike the people. Who I’d gladly do it all again for. And that’s saying something.

I’ve reflected before on the irony of the REF priding itself on being a peer-review process that ultimately results in a spreadsheet full of numbers. But the sheer reductiveness of the the REF has never struck me so hard or left such a bitter taste in my mouth. ‘I hate the REF’, I said to my Pro Vice-Chancellor of Research as he strolled around REF Central. He thought I was joking.  

  1. A healthy sector has a range of results

Of course one of the problems with digging only for the biggest bones in the REF results data mine is that we end up ignoring all the smaller ones. But (with apologies for the shaky archaeological metaphor), it’s in their combination that the full treasure is found.

With the change in REF rules this time around (decoupling individuals from outputs so you could submit your best n outputs overall) an increase in 4* volume was inevitable. This didn’t stop the joint funding bodies celebrating, without irony, the increase in 4*-rated submissions as an indicator of a healthy sector. 

But just as a healthy institution hosts those at the beginning of their career as well as those at the end, so a healthy sector also has a range of institutions across the research intensity spectrum. Similarly, some institutions branch out into new discipline areas where they are not yet established. Some disciplines take portfolio approaches to their outputs with nascent ideas (two-star) discussed in chapters and articles that generate discussion which ultimately leads to a life’s (four-star) work. This is normal

In our Day 1 REF comms meeting I found myself arguing for the value of a two-star output. It’s internationally recognised, I cried, this is a wonderful thing! Indeed, this is something REF Panel C Chair Jane Millarreiterated in Tuesday’s Institutional REF Debrief, stressing their value in underpinning impact. (Impact case studies have to be based on at least two-star outputs). 

The problem with the competitive culture that the REF engenders is that those with inter/nationally recognised research are seen as losers rather than beginners, or portfolio-builders. This view is reinforced when, for ease of viewing, REF data are RAG-rated (red-amber-green) to highlight our various ‘strengths’. To my mind this is literally de-grading.  I refused to RAG-rate my spreadsheets, choosing instead shades of green. The Vice Chancellor gave me a quizzical smile. There’s nothing poor in our submission, I clarified. It’s all good, and some is very good.  

  1. The right of reply

Frankly speaking, I don’t believe all our results. That’s not sour grapes. Or me thinking that poor Professor X deserved better. I just think in some units the scores just don’t smell right. And given what we know about the assessment process, that it was very few people (100 per cent human), assessing a very high volume of submissions in very little time, that’s entirely possible, right?

We also seem to have one mysteriously unclassified output. And I know it’s noise in the light of the overall result, but I still want to know what that was and why it was deemed unclassified. Even some indication as to whether it was an Open Access issue (shouldn’t have been) a date mishap (it fell outside the time-window) or a grading matter (it’s not even nationally recognised) would at least give us a steer. But unfortunately, with REF results there are no queries and no appeals. And unless the panel chooses to make reference to it in their written feedback (due in June) we’ll just never know. 

Given the REF made sure our colleagues had the right to appeal our REF-related decision-making, and that other higher education institutions had the right to complain that we didn’t stick to our Codes of Practice, it feels a bit rich that we have no right of reply at all. I know we’d be here forever if everyone could appeal everything, but maybe giving each institution three disclosures and one appeal would go some way to demystifying some of our odder results?

  1. Unpaid and emotional labour

Finally, at the end of Day 2 when all our results were known, we held a private celebration with all those who’d supported the REF most closely. Our Pro Vice-Chancellor for Research rightly congratulated everyone on their efforts, including highlighting the invisible work that goes on in support of a REF submission. ‘Unrecognised and unseen’, he called it. ‘And unpaid’, I muttered.

Because that is the reality of so much of what we do in academia, and research managers are no exception. Whilst our academic colleagues worked into the night writing and rewriting institutional environment statements until they ‘answered the exam question’ within the word count, those of us in REF Central were doing equally burdensome but significantly less meaningful tasks. Mine included sobbing in the toilets as I read through 130 individual staff circumstances submissions. And then sobbing again as I realised that their agonising combination of personal tragedy, physical and mental health issues only entitled their Unit to a reduction of half a fricking output. Yes, I’m still angry.

One of my colleagues was regularly up until one o’clock in the morning in the weeks before the submission, technical checking impact case studies to ensure the rules were minutely followed, that fonts were the right size and margins didn’t run over the page. This is the reality of a REF submission. Not glamorous, not strategic, and not always paid.

I know the joint funding bodies are now gearing up to assess the cost of REF2021. I hope they factor in the free labour and the emotional costs. When we talk about reducing the burden, we don’t just mean the volume of work. 

Summary

The night before the REF results came out, I found myself musing on the curious breed of research manager who does the REF more than once. Unless the approach for REF2027 changes significantly, I hope I’m not one of them. 

Research Culture: where do you start?

This post was originally published on the MetisTalk blog on 7 April 2022.

Across the UK, a growing number of universities are starting to appoint dedicated research culture staff. At Glasgow, I’ve been lucky enough to take on their research culture portfolio when the fabulous Tanita Casci and Elizabeth Adams, co-founders of Glasgow’s Research Culture work with Miles Padgett, both left within a few weeks of each other. Glasgow has a clear research culture action plan, and much has been achieved. However, a recent research culture survey has, once again, highlighted challenges we’re keen to address. So like many others beginning research culture roles, I find myself asking, where do you start? How do you prioritise what feel like equally pressing needs?

Low-hanging fruit?

It’s tempting to go straight for the low-hanging fruit: the issues you can fix quickly with little resource; those things well within your jurisdiction. And why not? If they are genuine needs and you can address them swiftly, this might be a good use of your time. And you can show the community you’re dedicated to making progress. 

The challenge is that quick-fixes rarely solve the most deep-seated research culture problems, and when you run your annual research culture survey next time around, you might find that the lived experience for most researchers hasn’t changed that much. A lot of initiatives in the research culture space (think EDI and well-being initiatives) come under attack because they start with the low-hanging fruit (think celebrating International Women’s Day or running yoga classes), leading the community to believe their actions are just window-dressing.

Biggest problem first?

So, perhaps we should tackle the biggest issues first? How to give researchers more time to actually do research? How to tackle job precarity amongst early career researchers? How to eradicate toxic power imbalances from our labs? There’s a strong and urgent need to address these issues in our organisations. And if we’re squeamish about this we certainly shouldn’t be taking jobs in research culture. However, many of the bigger issues can’t be solved unilaterally by one institution, or they are a function of wider systemic problems such as the university funding model. So if you just start here, the chances are you might get no further in your whole research culture career.

Opportunism?

One approach, of course, is to see what crosses your path and go with the flow. An academic might propose a solution to a local problem that you can support. An external organisation might be offering a leadership training package you can buy into. You might get an opportunity to piggy-back on other internal or external developments that enhance your organisational culture.

However, whilst we don’t want to stick so rigidly to our plans that we miss out on the serendipitous, it feels a little reactive so just be tossed and blown by the winds of opportunity. Call me a control freak, but this isn’t the path for me. 

Start with what you value?

As the Chair of the  INORMS Research Evaluation Group I have been heavily involved in the development of their SCOPE framework for responsible research evaluation. As such, when thinking about ‘where to start’ with anything, I cannot help but to return to its first tenet, to ‘Start with what you value’. Identifying where to start with our research culture must surely begin with a proper understanding of what we value about a positive research culture: what does good look like? And a sense of the gap between what we have and what we want.

Maybe it’s the researcher in me, but I’m convinced that surfacing the values of our research communities through workshops and surveys is a really good way to get to the heart of the matter. However, you won’t get unanimous agreement on a way forward. You’ll get views ranging from the jaded to the enthusiastic; from the big-pictures to the personal bugbears; from car-parking to career progression, and everything in between.

Where do you start?

I do think that the values surfaced through these exercises have to be our starting point though. We need a strong sense of the lived experience of our research communities: the good, the bad, and the ugly. And an understanding of the issues that mean the most to the most. But we need these values to translate into a portfolio of actions across the short, medium and long term. So, to return to my own question, ‘where do you start?’, I would answer, ‘all of the above’. We need some quick wins, some long-haul too-important-not-to-try ambitions, and an openness to opportunities that aren’t in the plan. What matters is that they are all in line with your institution’s ‘heart’ and that you can evidence your efforts, your progress, and even your failures, as you go.

Ultimately, whilst those of us who care deeply enough about research culture to make it our daily occupation are likely to care deeply about starting in the right place, perhaps it doesn’t really matter where you start, as long as you start?

Five steps to healthy research career building

This post was originally published on The Hidden Curriculum blog on 25 January 2022

Elizabeth (Lizzie) Gadd is the Head of Research Operations at the University of Glasgow and a Research Policy Manager at Loughborough University. She leads the International Network of Research Management Societies (INORMS) Research Evaluation Group, the Association of Research Managers & Administrators (ARMA) Research Evaluation SIG, and the LIS-Bibliometrics Forum. Lizzie’s previous writing highlights four fundamental critiques of the way in which journal metrics and university rankings have been deployed in higher education.

a white person's hand holds a lightbulb in front of a sunset
Photo by Diego PH on Unsplash

Historically ‘getting on’ in academia demanded either a list-full of publications or a pocket-full of grant winnings – or both. Thankfully, times are changing. This blog post highlights some of the ways the sector is moving towards fairer research and researcher assessment, and how these developments might help doctoral (and post-doctoral) researchers take a healthier approach to career building. Many might be concepts that are still part of the Hidden Curriculum to you, and to many researchers, yet they will become an increasingly important part of research life.

You do you 

At the end of 2021 the UKRI announced they would be introducing a new ‘Resume for Research’ for grant applicants. Developed in response to the over-use of unhelpful quantitative indicators in assessing researcher careers, such as the h-index, and Journal Impact Factors, this is the latest in a series of ‘narrative CV’ formats being introduced by grant funding organisations across Europe. Narrative CVs seek to go beyond reductive metrics and to ascertain, in a qualitative way, researchers’ actual contribution to scholarship: to knowledge, to the development of others, to the research community and to society itself. 

In a similar way, the Contributor Role Taxonomy (CRediT) seeks to recognise a wider range of ‘inputs’ to ’outputs’ including software development and data curation, and the Hidden REF campaign sought to celebrate all those people and contributions not rewarded by the current REF process.

Given this move to recognise a much broader range of contributions to research, it would seem sensible to plan your career on these terms. Think about the discoveries you want to make, the real-world impacts you want to have, and the person you want to be. Because if you stay in academia this is likely the way that you will be judged. However, when you break down your ambitions in this way you might decide that academia is not the best place to fulfil them. And that’s a good outcome too.

You can’t be good at everything 

Whilst this move to value a broader range of contributions is definitely a good thing, it can inadvertently put even more pressure on researchers who now feel like they have to excel at everything. But it’s important to remember that this is not the point of a greater range of criteria. No-one can be good at everything, and no-one should be expected to be.

For this reason, my workplace, the University of Glasgow takes a ‘preponderance’ approach to promotion criteria, where colleagues have to meet expectations in only four of seven different areas of scholarly activity. Early proponents of narrative CV approaches have also made it clear that early career researchers do not have to complete all the sections. Playing to your strengths is the way to reach fulfilment in so many areas of our lives, and these new assessment mechanisms should enable those with a wider range of strengths to thrive. 

Do not compare yourself 

Given that the sector is seeking to value a broader range of research contributions – and that we can’t be good at everything – it follows that there is no real benefit to be gained by constantly comparing ourselves to others. I know it’s tempting. People keep telling you that you’re in a crowded marketplace and competition for jobs is fierce. You feel like you need to keep a check on your progress and how you measure up to others who might be competing with you for your next post. But constantly comparing yourself to others is a sure-fire way to misery. In fact, in Matt Haig’s book ‘Reasons to stay alive’ he has a chapter entitled ‘How to be happy’ which simply repeats the phrase ‘Do not compare yourself’ over and over again.

You are unique. Your doctorate is (by definition) unique. What you bring to it is unique. The circumstances you face are unique. And although we’re trying hard to eradicate structural inequalities in academia, they are still in evidence and may well affect you. If your CV looks different to someone else’s, there will be a good reason for that. Comparing yourself to others will not lead to anything good. Don’t do it. 

Serendipity plays its part 

One of the reasons it’s futile to over-compare ourselves is that serendipity plays such a significant role in our careers. Ask any established researcher to talk through their CV and they will share a number of stories about being in the right place at the right time, a sudden political interest in their area of expertise, or a chance encounter. In fact, as Rich Pancost, Head of the School of Earth Sciences at Bristol University writes about academic careersIf you get the job you dreamt of, you are brilliant and lucky; and if you do not, it is because you are brilliant and unlucky.” 

We are all constrained by opportunities and affected by privilege and luck – or lack of it. So, whilst we can plan all we like – and do plan! – we are not entirely in control of our own destiny. We can use this to our advantage by taking up networking opportunities which may connect us to new people and places. But remembering this when we learn of someone else’s success can be helpful too.

Take (& give) all the help you can get

My final thought on healthy career-building is to take – and give – all the help you can get. No one is an island and research is not a solo activity. Make the most of every conversation with your supervisor, your networks, your research office and doctoral college/graduate school. Get yourself a mentor and take every training opportunity. Find your people on social media and subscribe to some helpful blogs. But don’t just be a taker. To get the most out of these opportunities be a giver too. Share your own hints and tips on Twitter. Become a mentor yourself. Contribute to doctoral researcher networks. It is in giving that we receive, and a significant component of mental well-being is looking out for others where we can.

Being a doctoral or post-doctoral researcher is not an easy task, and uncertainties around your next step can be unsettling. This has been made more difficult by the excessive use of metrics in the assessment of researchers. It is early days, but an increased focus on research culture and drives towards fairer assessment do look set to recognise a broader range of skills and contributions to the research endeavour. Planning your career along these lines should lead to healthier and more fulfilling outcomes either within or outwith academia. 

How (not) to incentivise open research

This post first appeared on The Bibliomagician blog on 29 November 2021

Lizzie Gadd makes the case for open research being required not rewarded.

I recently attended two events: the first was a workshop run by the ON-MERRIT team, a Horizon 2020 project seeking to understand how open research practices might actually worsen existing inequalities. And the second was the UKRI Enhancing Research Culture event at which I was invited to sit on a panel discussing how to foster an open research culture. At both events the inevitable question arose: ‘how do we incentivise open research?’. 

UnSplash

And given the existing incentives system is largely based around evaluating and rewarding a researcher’s publications, citations, and journal choices, our instinct is look to alternative evaluation mechanisms to entice them into the brave new world of open. It seems logical, right? In order to incentivise open research we simply need to measure and reward open research. If we just displace the Impact Factor with the TOP Factor, the h-index with the r-index and citation-based rankings with openness rankings, all will be well.

But to my mind this logic is flawed.

Firstly, because openness is not a direct replacement for citedness.  Although both arguably have a link with ‘quality’ (openness may lead to it and citedness may indicate it) they are not quite the same thing. And it would be dangerous to assume that all open things are high quality things.

So we can add open research requirements to our promotion criteria, but we are still left with the conundrum as to how to assess research quality. And until an alternative for citations is found, folks are liable to keep relying on them as an easy (false) proxy. So we can think we’ve fixed the incentivisation problem by focusing on open research indicators, but we haven’t dealt with the much bigger and much more powerful disincentivisation problem of citation indicators.

If we’re looking to openness to improve our research culture, incentivising openness by measuring it feels pretty counterproductive to me.

Secondly, as I’ve argued before, open research practices are still unheard of by some and the processes by which to achieve them are not always clear. Open research practices need to be enabled  before we can incentivise them. Of course related to this is the fact that some open research practices are completely irrelevant to some disciplinary communities (you’ll have a hard job pre-registering your sculpture). And undoubtedly those from wealthy institutions are likely to get much more support with open research practices than those from poorer ones. In this way, we’re in danger of embedding existing inequalities in our pursuit of open practices – as the ON-MERRIT team are exploring.

But in addition to these pragmatic reasons as to why we can’t easily incentivise open research by measuring it, there is a darned good reason why we shouldn’t turn to measurement to do this job for us. And that is that HE is already significantly over-evaluated already.

Researchers are assessed from dawn til dusk: for recruitment, probation, appraisal, promotion, grant applications, and journal peer review. There is no dimension of their work that goes unscrutinised: where they work, who they collaborate with, how much they have written, the grants they have won, the citations they’ve accrued, the impact of their work, the PGRs they’ve supervised – it’s endless. And this in combination with a highly competitive working environment makes academia a hotbed for toxic behaviours, mental health difficulties, and all the poor practices we blame on “the incentives”. (Although Tal Yarkoni recently did an excellent job of calling out those who rely on blaming the incentives to excuse poor behaviours).

If we’re looking to openness to improve our research culture, incentivising openness by measuring it feels pretty counterproductive to me.  We don’t want to switch from narrow definitions of exceptionalism, to broader ‘open’ definitions of exceptionalism, but away from exceptionalism altogether. Adding open to a broken thing just leaves us with an open broken thing.

Surely this is what we want for open research? Not that it should be treated as an above-and-beyond option for the savvy few, but that it should be a bread-and-butter expectation on everyone.

So how do we incentivise open?

Well, this is where I think we can learn from other aspects of our research environment. Because at the end of the day, open research practices are simply a set of processes, protocols and standards that we want all researchers to adhere to as relevant to their discipline.  And we put plenty of these expectations on our researchers already, such as gaining ethical approvals, adhering to reporting guidelines, and following health & safety standards.

There’s no glory associated with running due diligence on your research partners and following GDPR legislation won’t give you an advantage in a promotion case. These are basic professional expectations placed on every self-respecting researcher. And whilst there are no prizes for those who adhere to them, there are serious consequences for those that don’t.  Surely this is what we want for open research? Not that it should be treated as an above-and-beyond option for the savvy few, but that it should be a bread-and-butter expectation on everyone.

Now I appreciate there is probably an interim period where institutions want to raise awareness of open research practices (as I said before, they need to be enabled before they can be incentivised).  And during this period, running some ‘Open Research Culture Awards’ or offering ‘Open research hero badges’ to web pages might have their place. But we can’t dwell here for long.  We need to move quite rapidly to this being a basic expectation on researchers. We have to define what open research expectations are relevant to each discipline. Add these expectations to our Codes of Good Research Practice. Train researchers in their obligations. Monitor (at discipline/HEI level) engagement with these expectations. And hold research leads accountable for the practices of their research groups.

Adding open to a broken thing just leaves us with an open broken thing.

To my mind, the same applies to measuring open research at institutional level, for example in REF exercises. We should require HEIs to expect and enable disciplinary appropriate open research practices from their researchers and to evidence that they a) communicate those expectations, b) support researchers to meet those expectations, and c) are improving on meeting those expectations. That’s all. No tricky counting mechanisms. No arbitrary thresholds. No extra points for services that are just the product of wealth.

Of course, if we are going to monitor take up of open research at discipline and university level, we do need services that indicate institutional engagement with open research practices. But again I see this as being an interim measure, and more to highlight where work needs to be done than to give anyone boasting rights. When open research becomes the modus operandi for everybody, monitoring just becomes a quality assurance process. There’s no point ranking institutions on the percentage of their outputs that are open access when everybody hits 100%.

I know this doesn’t tackle the disincentivisation problem of journal impact factors, but open never did.  We have moved from a serials crisis (where the costs were high, the speeds were slow, and only a few could read them) to an open serials crisis (where the costs are high, the speeds are slow, and only a few can publish in them). To me this is a separate problem that could be fixed quite easily if funders placed far bolder expectations on their researchers to only publish on their own platforms – but that’s another blog post.

We all want open research and we all want to fix the incentives problem as we see this as slowing our progress towards open research. But I think offering up one as the solution to the other is not going to get us where we want to go. Indeed, I think it’s potentially in danger of exacerbating unhelpful tendencies towards exceptionalism when what we really want is boring old consistent, standards-compliant, rigorous research.

Campbells law rightly tells us that we get what we measure, but the inverse – that we need to measure something in order to get it – is not always true. In our rightful pursuit of all things open, I think it’s important that we remember this.

Elizabeth Gadd is Head of Research Operations at the University of Glasgow. She is the chair of the Lis-Bibliometrics Forum and co-Champions the ARMA Research Evaluation Special Interest Group. She also chairs the INORMS International Research Evaluation Working Group.

Love DORA, Hate Rankings?

This piece first appeared on the LSE Impact Blog on 10 May 2021.

Lizzie Gadd argues that any commitment to responsible research assessment as outlined in DORA (Declaration on Research Assessment) and other such manifestos needs to include action on global university rankings. Highlighting four fundamental critiques of the way in which journal metrics and university rankings have been deployed in higher education, she proposes universities could unite around the principle of being ‘much more than their rank’.


More and more institutions are signing up to responsible metrics manifestos such as DORA – which is great. This is no doubt influenced by funder demands that they do so – which is also great. And these manifestos are having a positive impact on researcher-level evaluation – which is triply great. But, as we all know, researcher-level evaluation issues, such as avoiding Journal Impact Factors, are only one element of the sector’s research evaluation problems.

UKRI Chief Executive Ottoline Leyser recently pointed out that any evaluation further up the food-chain in the form of university- or country-level evaluations ultimately has an impact on individual researchers. And of course the most influential of these, at the top of the research evaluation food-chain, are the global university rankings.

So why, I often ask myself, do we laud universities for taking a responsible approach to journal metrics and turn a blind eye to their participation in, and celebration of, the global rankings?

Indeed, when you look at the characteristics of Journal Impact Factors (JIFs) and the characteristics of global university rankings, they both fall foul of exactly the same four critiques.

1. The construction problem

As DORA states, there are significant issues with the calculation of the JIF: the average cites per paper for a journal over two years. Firstly, providing the mean cites-per-paper of a skewed dataset is not statistically sensible. Secondly, whilst the numerator includes all citations to the journal, the denominator excludes ‘non-citable items’ such as editorials and letters – even if they have been cited. Thirdly, the time window of two years is arguably not long enough to capture citation activity in less citation dense fields, as a result you can’t compare a JIF in one field with that from another.

However, global university rankings are subject to even harsher criticisms about their construction. The indicators they use are a poor proxy for the concept they seek to evaluate (the use of staff:student ratios as a proxy for teaching quality for example). The concepts they seek to evaluate are not representative of the work of all universities (societal impacts are not captured at all). The data sources they use are heavily biased towards the global north. They often use sloppy reputation-based opinion polls. And worst of all, they combine indicators together using arbitrary weightings, a slight change in which can have a significant impact on a university’s rank.

2. The validity problem

Construction issues aside, problems with the JIF really began when it was repurposed from an indicator to decide which journals should appear in Garfield’s citation index, to one used by libraries to inform collection development, and then by researchers to choose where to publish and finally by readers (and others) to decide which research was the best for being published there. It had become an invalid proxy for quality, rather than as a means of ensuring the most citations were captured by a citation index.

Whilst the JIF may have inadvertently found itself in this position, some of the global rankings quite deliberately over-state their meaning. Indeed, each of the ‘big three’ global rankings (ARWU, QS and THE WUR) claim to reveal which are the ‘top’ universities (despite using different methods for reaching their different conclusions). However, given the many and varied forms of higher education institutions on the planet, none of these high-profile rankings articulates exactly what their ‘top’ universities are supposed to be top at. The truth is that the ‘top’ universities are mainly top at being old, large, wealthy, English-speaking, research-focussed and based in the global north.

3. The application problem

Of course, once we have indicators that are an invalid proxy for the thing they claim to measure (JIFs signifying ’quality’ and rankings signifying ‘excellence’) third parties will make poor use of them for decision-making. Thus, funders and institutions started to judge researchers based on the number of outputs they had in high-JIF journals, as though that somehow reflected on the quality of their research and of them as a researcher.

In a similar way, we know that some of the biggest users of the global university rankings are students seeking to choose where to study (even though no global ranking provides any reliable indication of teaching quality) because who doesn’t want to study at a ‘top’ university? But it’s not just students; institutions and employers are also known to judge applicants based on the rank of their alma mater. Government-funded studentship schemes will also often only support attendance at top 200 institutions.

4. The impact problem

Ultimately, these issues have huge impacts on both individual careers and the scholarly enterprise. The problems associated with the pursuit of publication in high-JIF journals have been well-documented and include higher APC costs, publication delays, publication of only positive findings on hot topics, high retraction rates, and negative impacts on the transition to open research practices.

The problems associated with the pursuit of a high university ranking are less well-documented but are equally, if not more, concerning. At individual level, students can be denied the opportunity to study at their institution of choice and career prospects can be hampered through conscious or unconscious ranking-based bias. At institution level, ranking obsession can lead to draconian hiring, firing and reward practices based on publication indicators. At system level we see increasing numbers of countries investing in ‘world-class university’ initiatives that concentrate resource in a few institutions whilst starving the rest. There is a growing inequity both within and between countries’ higher education offerings that should seriously concern us all.

What to do?

If we agree that global university rankings are an equally problematic form of irresponsible research evaluation as the Journal Impact Factor, we have to ask ourselves why their usage and promotion does not form an explicit requirement of responsible metrics manifestos. An easy answer is that universities are the ’victim’ not the perpetrator of the rankings. However, universities are equally complicit in providing data to, and promoting the outcomes of, global rankings. The real answer is that the rankings are so heavily used by those outside of universities that not to participate would amount to financial and reputational suicide.

rankings are so heavily used by those outside of universities that not to participate would amount to financial and reputational suicide

Despite this, universities do have both the power and the responsibility to take action on global university rankings that would be entirely in keeping with any claim to practice responsible metrics. This could involve:

  1. Avoiding setting KPIs based on the current composite global university rankings.
  2. Avoiding promoting a university’s ranking outcome.
  3. Avoiding legitimising global rankings by hosting, attending, or speaking at, ranking-promoting summits and conferences.
  4. Rescinding membership of ranking-based ‘clubs’ such as the World 100 Reputation Academy.
  5. Working together with other global universities to redefine university quality (or more accurately, qualities) and to develop better ways of evaluating these.

recently argued that university associations might develop a ‘Much more than our rank’ campaign. This would serve all universities equally – from those yet to get a foothold on the current rankings, to those at the top. Every university has more to offer than is currently measured by the global university rankings – something that I’m sure even the ranking agencies would admit.  Such declarations would move universities from judged to judge, from competitor to collaborator. It would give them the opportunity to redefine and celebrate the diverse characteristics of a thriving university beyond the rankings’ narrow and substandard notions of ‘excellence’.

The time has come for us to extend our definition of responsible metrics to include action with regards to the global university rankings. I’m not oblivious to the challenges, and I am certainly open to dialogue about what this might look like.  But, we shouldn’t continue to turn a blind eye to the poor construction, validity, application and impact of global rankings, whilst claiming to support and practice responsible metrics. We have to start somewhere, and we have to do it together, but we need to be  brave enough to engage in this conversation.


The author is very grateful to Stephen Curry for feedback on the first draft of this blogpost.

The challenge of measuring open research data

This post was originally published on The Bibliomagician Blog on 24 March 2021

Lizzie Gadd & Gareth Cole discuss the practical challenges of monitoring progress towards institutional open research data ambitions.

Loughborough University has recently introduced a new Open Research Position Statement which sets out some clear ambitions for open access, open data and open methods. As part of this work we’re looking at how we can monitor our progress against those ambitions. Of course, for open access, we’re all over it. We have to count how many of our outputs are available openly in accordance with various funder policies anyway. But there are no equivalent demands for data. OK, all data resulting from some funded projects need to be made available openly, but no-one’s really counting – yet. And anyway, our ambitions go beyond that – we’d like to encourage all data to be made available openly and ‘FAIRly’ where possible.

So how do we measure that? Well, with difficulty it would seem. And here’s why: 

Equivalence

In the world of journal articles, although there are disciplinary differences as to the number of articles that are produced, every article is roughly equivalent in size and significance to another. Data are not like that. What qualifies as a single unit of data, thus receiving its own metadata record, might be a photograph or a five-terabyte dataset. So it would be a bit unfair to compare the volume of these. And there is currently no agreement as to ‘how much data’ (in size, effort, or complexity) there needs to be to qualify for a unique identifier.

What qualifies as a single unit of data, thus receiving its own metadata record, might be a photograph or a five-terabyte dataset.

But it’s not just what counts as a unit of data but what counts as a citable unit of data that differs. A deposit of twenty files could have one DOI/Identifier or twenty DOIs depending on how it is split up. This means that potentially there could be citation advantages or disadvantages for those that deposit their data in aggregate or individually – but this would entirely depend on how the citer chooses to cite it.

Source

For journal articles, full-text versions are duplicated all over the place. The same article might be available on multiple repositories, pre-print servers and the publisher’s site. In fact, whilst there are concerns about version control, there are many benefits to such duplicates in terms of discovery and archiving (Lots of Copies Keeps Stuff Safe [LOCKSS] and all that). But for data, it’s not good practice to duplicate in different repositories. This is both for versioning reasons (if you update in one place, you then have to update in others) and for DOI reasons (two instances usually means two DOIs which means that any data citations you get will be split across two sources). 

So if we wanted to identify all the data produced by Loughborough academics, we’d have a pretty difficult job doing it. Some will be on our repository, but other data will be spread across multiple different archives. Data citation and linking services such as Datacite and Scholix may ultimately offer a solution here of course, but as others have noted, these have a long way to go before they are truly useful for monitoring purposes. Datacite only indexes data with a Datacite DOI. And Scholix only surfaces links between articles and datasets, not the independent existence of data.  Some services, such as Dimensions, only index records that have an item type of “Dataset”. This means that data taking other forms such as “media” or “figure” won’t appear in Dimensions, thus disenfranchising those researchers who use “data” but not “datasets”.

But the biggest problem these services face is that they rely on consistent high-quality metadata collection, curation and sharing by all the different data repositories.

But the biggest problem these services face is that they rely on consistent high-quality metadata collection, curation and sharing by all the different data repositories. And we’re just not there yet. (Although all repositories which mint DataCite DOIs will need to comply with the minimum/mandatory requirements to mint the DOI in the first place). And a particular problem for institutions seeking to measure their own dispersed data output is that few repositories expose author affiliation data even where they do comprehensively collect it. And this leads us on to our third point.

Authorship

The authorship of journal articles is increasingly controlled and subject to much guidance. Learned societies provide guidance, journals provide guidance, institutions sometimes have their own guidance. The CRediT taxonomy (whilst not without problems) was introduced to make it absolutely explicit as to who did what on a journal article. The same is not usually true of data.

Of course, data is created rather than authored as the DataCite schema makes clear. But there is no way of ensuring that all the data creators have been added to the metadata record, even if the depositors do always know who they are. And whilst there is no real glory associated with data ownership, this problem isn’t going to be quickly resolved. As with journal articles, often the list of contributors is likely to be very long so there needs to be some incentive to do this carefully and well.

And whilst there is no real glory associated with data ownership, this problem isn’t going to be quickly resolved.

This is where we butt up against Professor James Frew‘s two laws of metadata:

  1. Scientists don’t write metadata;
  2. Any scientist can be forced to write bad metadata.

There seems to be scope for some CRediT-type contributor guidance for data to ensure all the right people get the right credit. (Maybe a Research Data Alliance Working Group?) And then there needs to be some motivation for depositors to stick to it.

Quality assurance

Although the standard of journal peer review is variable and hotly contested as a mechanism to signify quality, at least all journal articles are subject to some form of it prior to publication. Data are not currently peer reviewed (unless submitted as a data paper or if the dataset is provided as supplementary information to a journal submission). And although data can be cited, this appears to be still comparatively quite rare. This may partly be due to the challenge of ‘counting’ data citations due to huge variations in citation quality, whether data is cited at all (or added as part of a data availability statement), and disciplinary differences in the way this is done. And there is a big difference between a data citation which just states ‘this data is available and relevant to my study’ and a data citation which signifies that ‘the data in question has actually been re-analysed or repurposed in my study’. But a data citation doesn’t currently differentiate between the two.

Of course, the accepted standard for data quality is the FAIR principles: data should be Findable, Accessible, Interoperable and Reusable. But despite many investigations (see: https://fairsharing.org/ , https://www.fairsfair.eu/ and https://github.com/FAIRMetrics/Metrics) into the best way of assessing how FAIR data is, the average institutional data repository has no easy way of quickly identifying this.

There is also the challenge that FAIR data may not be open data, and vice versa. Some data can never be open data for confidentiality reasons. So in our attempt to pursue open research practices, and given a choice, what do we count? Open data that may not be FAIR? FAIR data that may not be open? Or only data that are both? And if so, how fair is that?

Summary

So where does this leave us at Loughborough? Well, in a less than satisfactory situation to be honest. We could look at the number of data deposits (or depositors) in our Research Repository per School over time to give us an idea of growth. But this will only give us a very partial picture. We could do a similar count of research projects on ResearchFish with associated data, or e-theses with related data records, but again, this will only give us a small window onto our research data activity. Going forward we might look at engagement with the data management planning tool, DMP Online, over time, but again this is likely to shine more light on disciplines that have to provide DMPs as part of funding applications and PhD studies. 

So, whilst we can encourage individuals to deposit data, and require narrative descriptions of their engagement with this important practice for annual appraisals, promotions, and recruitment, we have no meaningful way of monitoring this engagement at department or University-level. And as for benchmarking our engagement with that happening elsewhere, this currently feels like it’s a very long way off.

The big fear of course, is that this is where commercial players rock up and offer to do this all for us – for a price. (Already happening). And given that data is a research output not currently controlled by such outfits, it would be a very great shame to have to pay someone else to tell us about the activities of our own staff. In our view.

Really hoping that some clever, community-minded data management folk are able to help with this.

Elsevier have endorsed the Leiden Manifesto: so what?

This piece was originally posted to The Bibliomagician blog on 22 September 2020

Lizzie Gadd speculates as to why Elsevier endorsed the Leiden Manifesto rather than signing DORA, and what the implications might be.

If an organisation wants to make a public commitment to responsible research evaluation they have three main options: i) sign DORA, ii) endorse the Leiden Manifesto (LM), or iii) go bespoke – usually with a statement based on DORA, the LM, or the Metric Tide principles.

The LIS-Bibliometrics annual responsible metrics survey shows that research-performing organisations adopt a wide range of responses to this including sometimes signing DORA and adopting the LM. But when it comes to publishers and metric vendors, they tend to go for DORA. Signing DORA is a proactive, public statement and there is an open, independent record of your commitment. DORA also has an active Chair in Professor Stephen Curry, and a small staff in the form of a program director and community manager, all of whom will publicly endorse your signing which leads to good PR for the organisation.

A public endorsement of the LM leads to no such fanfare. Indeed, the LM feels rather abandoned by comparison. Despite a website and blog, there has been little active promotion of the Manifesto, nor any public recognition for anyone seeking to endorse it. Indeed one can’t help wondering how differently the LM would operate if it had been born in a UK institution subject to the impact-driven strictures of the REF?

But despite this, Elsevier recently announced that they had chosen the Leiden Manifesto over DORA. Which leads us to ask i) why? And ii) what will this mean for their publishing and analytics business?

Why not DORA?

Obviously I wasn’t party to the conversations that led to this decision and can only speculate. But for what it’s worth, my speculation goes a bit like this:

So, unlike the LM which provides ten principles to which all adopters should adhere, DORA makes different demands of different stakeholders. So research institutions get off pretty lightly with just two requirements: i) don’t use journals as proxies for the quality of papers, and ii) be transparent about your reward criteria. Publishers and metrics suppliers, however, are subject to longer lists (see box) and of course, Elsevier are both.  And it is within these lists of requirements that I think we find our answers.

This image has an empty alt attribute; its file name is image.png
Box: Excerpt from DORA Principles
  1. Positioning CiteScore as the JIF’s responsible twin.

Firstly, DORA demands that publishers ‘greatly reduce emphasis on JIF as a promotional tool’. However, Elsevier have invested heavily in CiteScore (their alternative to the JIF) and are not likely to want to reduce emphasis on it. Indeed the press release announcing their endorsement of the LM provided as an example, the way they’d recently tweaked the calculation of CiteScore to ensure it met some of the LM principles, positioning it as a ‘responsible metric’ if you will. This is something they’d struggle to get away with under DORA.

  1. Open citations? Over my dead body

One of the less well-discussed requirements of DORA for publishers is to “remove all reuse limitations on reference lists in research articles and make them available under the Creative Commons Public Domain Dedication.” In other words, DORA expects publishers to engage with open citations. This is something Elsevier have infamously failed to do.

  1. Open data? You’ll have to catch me first

And finally, DORA expects metric suppliers to not only “be open and transparent by providing data and methods used to calculate all metrics” (which they partly do for subscribers) but to “Provide the data under a licence that allows unrestricted reuse, and provide computational access to data, where possible” (which they don’t).

So whereas DORA is a relatively easy sign for HEIs (only two requirements) for publishers, it’s more tricky than might first appear (five requirements) and for an organisation like Elsevier which also supplies metrics, they have to contend with a further four requirements, which would essentially eat away at their profits. And we all know that they’re only just scraping by, bless them.

The impact of endorsing the Leiden Manifesto

But isn’t it good enough that they’ve endorsed the Leiden Manifesto? After all, it’s a comprehensive set of ten principles for the responsible use of bibliometrics? Well, being a seasoned grumbler about some of the less savoury aspects of Elsevier’s SciVal, I decided to take to the discussion lists to see whether they saw this move as a the beginning or the end of their responsible metrics journey. Was this the start of a huge housekeeping exercise which would sweep away the h-index from researcher profiles? Disinfect the unstable Field-Weighted Citation Index from author rankings? And provide health-warnings around some of the other over-promising and under-delivering indicators?

Apparently not.

There is nothing inherently wrong with the h-index” said Holly Falk-Krzesinski, Elsevier’s Vice-President for Research Intelligence, pointing to three of the Leiden Manifesto’s principles where she felt it passed muster. (Despite on the same day, Elsevier’s Andrew Plume questioning its validity).  And as part of a basket of metrics, she considers the FWCI is a perfectly usable indicator for researchers. (Something Elsevier’s own SciVal Advisors disagree with). And she believes the h-index is “not displayed in any special or prominent way” on Pure Researcher Profiles. Erm…

This image has an empty alt attribute; its file name is leiden-elsevier-h-index-profile-example.jpg

And after several rounds of this, frankly, I gave up. And spent a weekend comfort-eating Kettle chips. Because I care deeply about this. And, honestly, it felt like to Elsevier it was just another game to be played. 

Responsible is as responsible does

Back in 2018 I made the point that if we weren’t careful, responsible metrics statements could, in an ironic turn, easily become ‘bad metrics’, falsely signifying a responsible approach to metrics that wasn’t there in practice. And the reason these statements are so vulnerable to this is that neither DORA nor the LM are formally policed. Anyone can claim to be a follower and the worst that can happen is that someone calls out your hypocrisy on Twitter. Which does happen. And is sometimes even effective.

It is for this reason that the Wellcome Trust have stated that adopting a set of responsible metrics principles is not enough. If you want to receive their research funding from 2021, you need to demonstrate that you are acting on your principles. Which is fair. After all, if you want Athen Swan accreditation, or Race Equality Chartership or a Stonewall Charter, you have to provide evidence and apply for it. It’s not self-service. You can’t just pronounce yourself a winner. And I can’t help wondering: yes, Elsevier has endorsed the Leiden Manifesto, but would the Leiden Manifesto (given the chance) endorse Elsevier? 

Now I know that CWTS and DORA would run a mile from such a proposition, but that doesn’t mean it’s not needed.  Responsible-metrics-washing is rife. And whilst I‘d rather folks washed with responsible metrics than anything else – and I’m sure a few good things will come out of it – it does rather feel like yet another instance of a commercial organisation paying lip-service to a community agenda for their own ends (see also: open access and copyright retention).

Right on cue, Helen Lewis in The Atlantic recently described the ”self-preservation instinct [that] operates when private companies struggle to acclimatize to life in a world where many consumers vocally support social-justice causes”. “Progressive values are now a powerful branding tool” she writes, and “Brands will gravitate toward low-cost, high-noise signals as a substitute for genuine reform, to ensure their survival.” Correct me if I’m wrong but that sounds pretty apposite?

Of course, it’s early days for Elsevier’s Leiden Manifesto journey and Andrew Plume did seek to reassure me in a video call that they were still working through all the implications. So let’s hope I’m worrying about nothing and we’ll be waving goodbye to the h-index in Elsevier products any day soon.  But if nothing does transpire, I know as the developer of a responsible metrics model myself, that I’d feel pretty sick about it being used as empty virtue-signalling. And it does occur to me that funders’ seeking to hold institutions to account for their responsible research evaluation practices might do well to direct their attention to the publishers they fund.

Otherwise I fear it really will be case of, well, Elsevier have endorsed the Leiden Manifesto: so what?

Rethinking the Rankings

This piece was originally posted to the ARMA blog on 14 October 2020.

Lizzie Gadd and Richard Holmes share the initial findings of the INORMS Research Evaluation Working Group’s efforts to rate the World University Rankings.

When the INORMS Research Evaluation Working Group (REWG) was formed in 2016, Lizzie asked the representatives of twelve international research management societies where they felt we should focus our attention if we wanted to achieve our aim of making research evaluation more meaningful, responsible and effective. They were unanimous: the world university rankings. Although research managers are not always the ones in their institutions that deal with the world university rankers, they are one of the groups that feel their effect most keenly: exclusion from certain funding sources based on ranking position; requests to reverse engineer various indicators to understand their scores, and calls to introduce policies that may lead to better ranking outcomes. And all whilst fully appreciating how problematic rankings are in terms of their methodology, their validity and their significance.

So what could be done? Well, it was clear that one of the key issues with the world ranking bodies is that they are unappointed and they answer to nobody. In an earlier blog post where Lizzie describes the research evaluation environment as a food chain, she put them at the top: predators on which no-one predates. (Although some Scandinavian colleagues see them more as parasites that feed off the healthy organisms: taking but not giving back). And of course the way to topple an apex predator, is to introduce a new one: to make them answerable to the communities they rank.  So this is what the INORMS REWG set about doing, by seeking to introduce an evaluation mechanism of their own to rate the rankers.

In some parallel work, the REWG were developing SCOPE, a five-step process for evaluating effectively, so we were keen to follow our own guidance when designing our ranker ratings. And this is how we did so:

Start with what you value

Our first step was to identify what it was we wanted from any mechanism seeking to draw comparisons between universities. What did we value? To this end we sought out wisdom from all those who’ve gone ahead of us in this space: the Berlin Principles on Ranking HEIs, the work of Ellen Hazelkorn, the CWTS principles for responsible use of rankings, the Leiden ManifestoDORAYves Gingras, and many others. From their thoughts we synthesised a draft list of Criteria for Fair and Responsible University Rankings and put them out to the community for comment. We got feedback from a wide range of organisations: universities, academics, publishers and ranking organisations themselves. The feedback was then synthesised into our value document – what we valued about the entity (rankers) under evaluation. These fell into four categories: good governance, transparency, measure what matters, and rigour.

Context considerations

There are lots of reasons we evaluate things. What we’re trying to achieve here is a comparison of the various ranking organisations, with the ultimate purpose of incentivising them to do better. We want to expose where they differ from each other but also to highlight areas that the community cares about where they currently fall short.  What we didn’t want to do is create another ranking. It would have been very tempting to do so: “ranking the rankings” has a certain ring to it.  But not only would this mean that a ranking organisation got to shout about its league-table-topping status – something we didn’t want to endorse – but we wouldn’t be practising what we preached: a firm belief that it is not possible to place multi-faceted entities on a single scale labelled ‘Top’ and ‘Bottom’.

Options for evaluating

Once we had our list of values, we then set about translating them into measurable criteria – into indicators that were a good proxy for the quality being measured. As anyone who’s ever developed an evaluation approach will know, this is hard. But again, we sought to adhere to our own best practice by providing a matrix by which evaluators could provide both quantitative and qualitative feedback. Quantitative feedback took the form of a simple three-point scale according to whether the ranker fully (2 marks), partially (1 mark) or failed (0 marks) to meet the set criteria. Qualitative feedback took the form of free-text comments.  To ensure transparency and mitigate against bias as best we could, we asked a variety of international experts to each assess one of six ranking organisations against the criteria. INORMS REWG members also undertook evaluations, and, in line with the SCOPE principle of ‘evaluating with the evaluated,’ each ranker was also invited to self-assess themselves.  Only one ranking organisation, CWTS Leiden, accepted our offer to self-assess and they provided free-text comments rather than scores.  All this feedback was then forwarded to our senior expert reviewer, Dr Richard Holmes, author of the University Ranking Watch blog, and certainly one of the most knowledgeable University Rankings experts in the world. He was able to combine the feedback from our international experts with his own, often inside, knowledge of the rankings, to enable a really robust, expert assessment.

Probe deeply

Of course all good evaluations should probe their approach, which is something we sought to do during the design stage, but something we also came back to post-evaluation. We observed some criteria where rankings might be disadvantaged for good practice – for example, where a ranking did not use surveys and so could not score. This led us to introducing ‘Not Applicable’ categories to ensure they would not be penalised. One or two questions were also multi-part which made it difficult to assess fairly across the rankers. In any future iteration of the approach we would seek to correct this. We noted that the ‘partially meets’ category is also very broad, ranging from a touch short of perfect to a smidge better than fail. In future, a more granular five- or even ten-point grading system might provide a clearer picture as to where a ranking succeeds and where it needs to improve.  In short, there were some learning points. But that’s normal. And we think the results provide a really important proof-of-concept for evaluating the world rankings.

Evaluate

So what did we find? Well we applied our approach to six of the largest and most influential world university rankings: ARWUTHE WRQS, U-MultirankCWTS Leiden and US News & World Report. A full report will be forthcoming and the data showing the expert assessments and senior expert calibrations are available. A spidergram of the quantitative element is given in Figure 1 and some headline findings are provided below.

Machine generated alternative text:
Good gcwernance 
1008; 
Rigour 
Measure what matters 
— ARM,' U 
Leiden 
T ranspare no,' 
Multirmk 
LENE'.VS

Figure 1. Spidergram illustrating the actual scores/total possible score for each world ranker. The full data along with the important qualitative data is available.

Good governance

The five key expectations of rankers here were that they engaged with the ranked, were self-improving, declared conflicts of interest, were open to correction and dealt with gaming. In the main all ranking organisations made some efforts towards good governance, with clear weaknesses in terms of declaring conflicts of interest: no ranker really did so, even though selling access to their data and consultancy services was commonplace. 

Transparency

The five expectations of rankers here were that they had transparent aims, methods, data sources, open data and financial transparency.  Once again there were some strengths when it came to the transparency of the rankers’ aims and methods – even if arguably the methods didn’t always meet the aims. The weaknesses here were around the ability of a third-party to replicate the results (only ARWU achieved full marks here), data availability, and financial transparency (where only U-Multirank achieved full marks).

Measure what matters

The five expectations of rankers here were that they drove good behaviour, measured against mission, measured one thing at a time (no composite indicators), tailored results to different audiences and gave no unfair advantage to universities with particular characteristics. Not surprisingly, this is where most rankings fell down. CWTS Leiden and U-Multirank scored top marks in terms of efforts to drive appropriate use of rankings and measuring only one thing at a time, the others barely scored.  Similarly, Leiden & U-Multirank fared quite well on measuring against mission, unlike the others. But no ranking truly tailored their offer to different audiences, assuming that all users – students, funders, universities, would value the different characteristics of universities in the same way.  And neither could any whole-heartedly say that they offered no unfair advantage to certain groups.

Rigour

The one thing university rankings are most criticised for is their methodological invalidity, and so it may come as no surprise that this was another weak section for most world rankers. Here we were looking for rigorous methods, no ‘sloppy’ surveys, validity, sensitivity and honesty about uncertainty. The ranker that did the best here by a country mile was CWTS Leiden, with perfect scores for avoiding the use of opinion surveys (joined by ARWU), good indicator validity (joined by U-Multirank), indicator sensitivity, and the use of error bars to indicate uncertainty. All other rankers scored their lowest in this section.

Summary

So there is clearly work to be done here, and we hope that our rating clearly highlights what needs to be done and by whom. And in case any ranking organisation seeks to celebrate their relative ‘success’ here, it’s worth pointing out that a score of 100% on each indicator is what the community would deem to be acceptable. Anything less leaves something to be desired.

One of the criticisms we anticipate is that our expectations are too high. How can we expect rankings to offer no unfair advantage? And how can we expect commercial organisations to draw attention to their conflicts of interest? Our answer would be that just because something is difficult to achieve, doesn’t mean we shouldn’t aspire to it. Some of the sustainable development goals (no poverty, zero hunger) are highly ambitious, but also highly desirable. The beauty of taking a value-led approach, such as that promoted by SCOPE, is that we are driven by what we truly care about, rather than by the art of the possible, or the size of our dataset. If it’s not possible to rank fairly, in accordance with principles developed by the communities being ranked, we would argue that it is the rankings that need to change, not the principles. 

We hope this work initiates some reflection on the part of world university ranking organisations. But we also hope it leads to some reflection by those organisations that set so much store by the world rankings: the universities that seek uncritically to climb them; the students and academics that blindly rely on them to decide where to study or work; and the funding organisations that use them as short-cuts to identify quality applicants. This work provides qualitative and quantitative evidence that the world rankings cannot, currently, be relied on for these things. There is no fair, responsible and meaningful university ranking. Not really. Not yet. There are just pockets of good practice that we can perhaps build on if there is the will.  Let’s hope there is.

Gadding about…*

*Virtually of course.

Courtesy of the pestilence currently scourging our planet, I’ve been able to accept four opportunities to speak this Autumn, as I will be doing so from the comfort of my own home office. For anyone interested in tuning in, I’ve provided the details here and will update this with more intel as I have it.


22-Sep-20 08.30 BST: Finnish Ministry of Education & Culture

https://www.helsinki.fi/en/news/helsinki-university-library/research-evaluation-national-bibliometrics-seminar-2020

Bibliometrics: Diversity’s friend or foe? Assessing research performance using bibliometrics alone does not help create a diverse research ecosystem. But can bibliometrics ever be used to support diversity? And if not, how else can we evaluate what we value about research?


07-Oct-20 17.00 BST: NIH Bibliometrics & Research Evaluation Symposium

https://www.nihlibrary.nih.gov/services/bibliometrics/bibSymp20

The Five Habits of Highly-Effective Bibliometric Practitioners Drawing on ten years’ experience supporting bibliometric and research evaluation practitioner communities, this presentation will highlight five habits of highly effective practitioners providing practical hints and tips for those seeking to support their own communities with robust research evaluation.


15-Oct-20 08.15 BST: 25th Nordic Workshop on Bibliometrics and Research Policy

Register

The Research Evaluation Food Chain and how to fix it. Poor research evaluation practices are the root of many problems in the research ecosystem and there is a need to introduce change across the whole of the ‘food chain’. This talk will consider the challenge of lobbying for change to research evaluation activities that are outside your jurisdiction – such as senior managers and rankings (introducing the work of INORMS REWG), vendors and ‘freemium’ citation-based services.


20-Oct-20 15.00 BST: Virginia Tech Open Access Week

https://virginiatech.zoom.us/webinar/register/WN_DbZMp_YcRKux9X3E9Hevig

Counting What Counts In Recruitment, Promotion & Tenure. What we reward through recruitment, promotion and tenure processes is not always what we actually value about research activity. This talk will explore how we can pursue value-led evaluations – and how we can persuade senior leaders of their benefits.