owid-grapher icon indicating copy to clipboard operation
owid-grapher copied to clipboard

Provide citation guidance in the Downloads tab

Open larsyencken opened this issue 3 years ago • 11 comments

Migrated from Notion: https://www.notion.so/owid/Provide-citation-guidance-in-the-Downloads-tab-50edb8b91cc24a3f9bfd2dc164f659fe

Problem

People download and reuse datasets, then cite us instead of the original providers. This makes original providers less happy sharing data with us, and by extension the general public.

Quick fix solution

Put citation guidance into the download tab of every chart, under the download button. In that guidance, we should separate out how to cite the data (it should be the same as Sources in the chart).

Full discussion below.


Background

We have a general worry that data providers do not get enough credit in our work.

That’s bad obviously because they deserve lots of credit. But it’s also a strategic risk for us: In order for us to do our job, we need data providers to be happy and supportive of our work.

One aspect of this general worry is how people cite data when accessing it through us. It’s a common thing that people say ‘Source: OWID’ at the bottom of a chart they’ve made, so that the data provider gets no credit.

We should do what we can to avoid this happening.

A simple step is simply to give users clear guidance on how they should write the citation when using data from OWID but not produced by OWID.

Where should the citation guidance be given?

❗ We should show it prominently in the place where most people reusing our data get it – the download tab on charts.

how to cite in download tab-01

  • We should also think about how we can make this clearer for people taking data from GitHub.

    For instance, in the COVID dataset we mention that “you should always check the license of any such third-party data”, but we don’t tell people how they should cite this data – only how they should cite ‘our’ testing and vaccinations data.

    (And the same would apply to a future API...)

Our general policy on this should maybe also be written up as an FAQ in our About section.

What should the citation guidance be? How should we implement it?

Quick fix proposal for now

Add something like the following to all charts:

How to cite this work Data should be cited as: ‘source’. Chart should be cited as: ‘Data from source, Chart from Our World in Data’

Or to make it very explicit:

How to cite this work If reusing this work, please provide a citation that makes clear the contribution of the data providers: Data should be cited as: ‘source’. Chart should be cited as: ‘Data from source, Chart from Our World in Data’

This message should be automatically generated, but with the possibility for manual override (i.e. it’s another field in the Grapher/Bulk-FASTT admin).

Longer-term solution

There are different cases that we need to think through what the guidance should be. In the short-term we could override less typical cases manually, but in the longer these different cases should be mostly be handled automatically.

  • The typical scenario – data from the World Bank, FAO, WHO etc.

    The text given above would apply to most typical scenarios. For instance, for the [Share in extreme poverty](https://ourworldindata.org/grapher/share-of-population-in-extreme-poverty?country=BGD~BOL~MDG~IND~CHN~ETH~COD) from World Bank, Povcal:

    If reusing this work, please provide a citation that makes clear the contribution of the data providers: Data should be cites as: ‘World Bank, Povcal’. Chart should be cited as: ‘Data from World Bank, Povcal; Chart from Our World in Data’

  • A very long reference or an academic paper reference

    We’d perhaps need to provide both a short and a full reference? i.e.

    *If reusing this work, please provide a citation that makes clear the contribution of the data providers:

    • Data should be cites as: ‘Poore and Nemecek (2018)’.
    • Chart should be cited as: ‘Data from Poore and Nemecek (2018); Chart from Our World in Data’

    – Poore, J., & Nemecek, T. (2018). Reducing food’s environmental impacts through producers and consumers. Science, 360(6392), 987-992. – Hannah Ritchie and Max Roser (2020) - "Environmental Impacts of Food Production". Published at [OurWorldInData.org](http://ourworldindata.org/).*

  • Where OWID itself is clearly the data source

    e.g. Vaccinations, Testing, War Deaths project

    If there’s a separate publication (as for Vaccinations and testing) then we can mention that instead.

  • ‘OWID based on X and Y’

    A more common case is where we have made changes/transformations etc. such that the data includes observations that couldn’t be found in the original sources – but where it’s not right for us to claim to be source, at least not in isolation.

    There’s really a range of sub-cases here:

    • Where we have made very minimal changes, or just calculated simple transformations (per capita rates say).
    • Where we have extended series by linking a small number of sources (E.g. [Working hours](https://ourworldindata.org/grapher/annual-working-hours-per-worker))
    • Where we have done some more substantial tinkering with the methods used in the original source (E.g. Bastian’s work on the [age of democracies](https://ourworldindata.org/democracies-age).)

    We might want a different recommended citation in these different sub-cases? (We might also want to revisit how we ourselves refer to these different cases within our own charts...)

@JoeHasell @maxroser

larsyencken avatar Feb 08 '22 08:02 larsyencken

@JoeHasell Just thinking that we do some small transformations to the data, such as throwing some out, renaming countries, sometimes calculating per-capita versions of metrics, etc. It could also be misleading to purely cite the upstream source, since it's possible for us to introduce errors that are not upstream.

The most accurate citation for data might actually be 'Poore and Nemecek (2018) via Our World In Data'.

I of course understand that it's sensitive for data providers. We can just leave it as 'Poore and Nemecek (2018)' and hope that our changes are minor enough that 95% of the time they do not diverge from the upstream data in any significant way.

Do you have an instinct here on which is better?

larsyencken avatar Feb 08 '22 09:02 larsyencken

My two cents: I agree this is very important, but I think many users could still miss the citation information in the Download tab because they never go to that tab—they just take a screenshot. Maybe there's a way to make it even more un-missable.

We see screenshots a fair amount, on social media but also from user interviews and user feedback messages—this includes academic researchers, people at nonprofits and in industry, etc. People who "should know better."

One idea: I think we could make the citation even more un-missable on the chart by putting a button right next to the source that provides the information on a click—see the blue "How to cite" in the source line of the chart.

how-to-cite

CGiattino avatar Feb 08 '22 09:02 CGiattino

In reference to Charlie's point:

  1. It's a bit of a design question, but my personal view would be to not take up prime real estate within the main chart view for something that will only be relevant to a very small subset of people.
  2. But we can then make it VERY prominent in the place where most people are downloading: the download tab.
  3. I also think it's a bit redundant on the main chart view: On the assumption that we are writing the source well in our charts, then someone showing a screenshot will already be showing the data source in a way we are happy with anyhow. Those people who take a selective screenshot of only part of the chart, actively cutting out the source, I think are unlikely to then take the trouble of writing their own citation.

JoeHasell avatar Feb 08 '22 10:02 JoeHasell

@mathisonian Was just reviewing this with @danyx23 now, unclear whether it seems worth bolting on to the Downloads part of grapher, or whether we should try to address this problem some other way. Any thoughts?

larsyencken avatar Jul 26 '22 11:07 larsyencken

This seems worthwhile to me. Its relatively low cost, and doesn't interfere with the current functionality. If its relatively straightforward to implement and helps keep the data providers happy it makes sense to me. I would keep it in the download tab and not put it in the main chart view

mathisonian avatar Jul 28 '22 16:07 mathisonian

Just to add here that (when we discussed this a long time ago now) there was a strong appetite for this from Max and the authors. We have heard on the grape vine that some data providers are not currently happy with what we do, and that (if true) is obviously a big issue. This was envisaged as a very quick and easy step so that at least we're going in the right direction.

JoeHasell avatar Jul 28 '22 18:07 JoeHasell

I agree that this is important – we should make clear who did the hard work of producing the data and we should avoid that our readers think that it is us at OWID who do this work.

Some comments:

  • On the download tab we should definitely do better. There we have space and we can be as clear as possible. Along the lines of what Lars outlined.

  • I'm not sure what the best solution for the regular chart tab is.

  • I agree with Joe's points above.

  • Is it maybe misunderstanable to say 'Source'? Maybe users don't actually understand what we mean? Maybe better to say 'Data:'? Or slightly longer 'Data source:'?

  • We could ask data providers what they want us to say. But it has the risk that it possibly ties our hands – if they say x and we don't want to do x then we are in a bad situation.

  • @eoo-owid, based on feedback we receive and which you read, do you have a recommendation for how to do better?

maxroser avatar Aug 04 '22 09:08 maxroser

@marcelgerber should have capacity to do this specific fix this cycle. Let's take baby steps here, and Matt can keep this concern in mind.

Aside from design changes, we could also even wrote more on this topic, explaining that we are a data republisher, and highlighting how much work goes into building these original datasets, both by institutions and individuals. We could also periodically highlight individual researchers or institutions who have done a mammoth job to fill an important data gap? (maybe we feel we do this a lot already)

larsyencken avatar Aug 04 '22 12:08 larsyencken

My view based on reading user feedback is that

  • We can do better by making citation explicit in more places. This includes this quick fix of adding citation guidance in download tab, which I agree is a good idea. It'll help get more eyes on the citation info, which is clearly currently a problem.

  • At the same time we can and should do better by writing the citation details more clearly and consistently where they are already written. This includes cleaning/improving the info we provide in the Source tab as well as improving how we write the labels in the source footer.

On the second point, we've gotten much better at this with the work from the Data Managers. But there's more we could do there. This actually came up recently, because there are some charts where the label in the source footer and the info on the source tab are not perfectly consistent, which creates ambiguity and confusion for users. More details here: https://github.com/owid/owid-issues/issues/513

eoo-owid avatar Aug 04 '22 22:08 eoo-owid

Some notes on this after talking to @danyx23 about this issue, which I'm gonna work on next cycle:

  • The solution in #1607 gets us 80% of the way there.
  • As mentioned in #1607, there are cases where the English sentence we build is nonsensical. We can try to:
    • either detect that using heuristics (e.g. if the source starts with Official data, do this)
    • or, offer an option to specify a custom phrase in the Admin
  • It might be the case that a citation guidance that is not great is making things worse, e.g. if we're saying Data by UN but don't specify the UN dataset then that's unhelpful and we're giving that as the "official" guidance.
  • Should we add a button like "Give feedback on this citation"?
  • I will join the Future of Publishing group after my holidays to chat about this a bit.

marcelgerber avatar Sep 20 '22 15:09 marcelgerber

Discussed this with the Future of Publishing group today:

  • We will want a free-text field on the chart-level that can override the auto-generated citation message. One reason to have this is that some data publishers give some citation guidance themselves (e.g. World Bank. [2021]), which we will want to respect when we're giving guidance.
  • We may want a way to disable the citation guidance display for some charts.
  • I will check all the short source lines, to see how many cases we have where the auto-generated text doesn't line up well.
  • We should definitely make sure that the Covid Vaccinations citation works well, since that's our most widely-accessed dataset.

marcelgerber avatar Oct 10 '22 16:10 marcelgerber

@JoeHasell we just talked about this with @mathisonian and we think it makes sense to think about this as part of the data pages project (i.e. how do we want things to be cited and come up with a plan that allows us to surface this information to users at some point in this year)

danyx23 avatar Feb 01 '23 17:02 danyx23

Very much agree!

JoeHasell avatar Feb 01 '23 17:02 JoeHasell