zotero-cita icon indicating copy to clipboard operation
zotero-cita copied to clipboard

Article citation data is often unreliable - catch these errors?

Open Dominic-DallOsto opened this issue 3 years ago • 6 comments

Using this article as an example, I add it to Zotero then download the reference list as a .ris file here. Most of the references are correctly formatted in the .ris file as journal articles, but whatever processing has been done on the list appears not to have "found" a number of the references, so they just contain the raw reference text from the article. And example is reference 2:

RAND Corporation. A Million Random Digits with 100,000 Normal Deviates (Free Press, 1955).

which is represented as follows in the .ris file

TY  - STD
TI  - RAND Corporation. A Million Random Digits with 100,000 Normal Deviates (Free Press, 1955).
ID  - ref2
ER  - 

which when imported into Cita gives the following:

image

For this particular article, 18/49 of the references suffer from this problem.

Could we detect when it looks like we have imported a raw reference string - perhaps by checking whether some of the author, title, or date fields are missing? And then once #113 is resolved we could alert the user to these errors in the data and attempt to fix them?

Dominic-DallOsto avatar Sep 23 '21 11:09 Dominic-DallOsto

Hi, Dominic. Sorry I'm taking so long to reply. I've been quite busy with other projects lately.

I'm not sure it should be on our side to check if metadata in imported records is complete. In fact, that could happen with any way of adding citations to Cita, right? It could also happen when syncing from Wikidata (although I understand that in this case it is unlikely that the title be a reference string; BTW this reminds me of this "cites work string" proposal), or when manually adding citations (although I see that in these cases the user would already know that some metadata is missing).

Anyway, if we do support this, we could have a dialog warning the user that some metadata may be missing for some citations, and when #113 has been addressed give them the option to try and fix them, as you suggested. What do you think?

diegodlh avatar Oct 04 '21 12:10 diegodlh

No worries! Yeah, having it as an option is what I meant, but I wasn't that clear.

I think this shouldn't really be our job, but I found this common and annoying enough an occurrence (in every paper I tried) that I think a workaround could be useful. I guess data from Crossref should be more complete once it's supported though.

Dominic-DallOsto avatar Oct 04 '21 14:10 Dominic-DallOsto

I guess data from Crossref should be more complete once it's supported though.

Just linking to #43 so we keep track of this over there.

diegodlh avatar Oct 12 '21 23:10 diegodlh

FYI

  1. The example in question is likely not to appear in cross-ref data
  2. On a number of occasions, I find the cross-ref data to be errant, but datacite data is worse.

On Tue, Oct 12, 2021 at 4:54 PM Diego de la Hera @.***> wrote:

I guess data from Crossref should be more complete once it's supported though.

Just linking to #43 https://github.com/diegodlh/zotero-cita/issues/43 so we keep track of this over there.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/diegodlh/zotero-cita/issues/135#issuecomment-941751590, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAJ2JTIZXY2EDOTSN4QOHTUGTDCFANCNFSM5ETM7H6Q .

HughP avatar Oct 13 '21 06:10 HughP

I just checked and crossref does actually have all the same references, but a bit more helpfully in a field called unstructured.

If this is common on crossref, we would need to handle it in some way I guess in #43? Whether we discard, try to correct, or directly import unstructured citations.

Dominic-DallOsto avatar Oct 13 '21 16:10 Dominic-DallOsto

I'll add that we may parse unstructured Crossref references in #43. Thanks!

diegodlh avatar Oct 18 '21 12:10 diegodlh