zotero-cita icon indicating copy to clipboard operation
zotero-cita copied to clipboard

Fetching citations from Crossref using REST API

Open Dominic-DallOsto opened this issue 3 years ago • 26 comments

Fixes #43

Uses CrossRef's REST API to get citations for an item. Mostly this works well because the cited items have DOIs which we can resolve. For items without identifiers, I attempt to roughly parse as much information as I can from them. Currently I just discard unstructured citations, but often some data (title, extra authors) is present in here that isn't in the strucutured data, so parsing them would be nice once #113 is resolved. For some items like this the data is CrossRef is still quite messy.

todo:

  • [x] translate strings
  • [x] provide user agent / email for CrossRef API

Dominic-DallOsto avatar Oct 24 '21 10:10 Dominic-DallOsto

Seems to be working nicely now. A couple of questions though:

  • should we include an email address in the CrossRef API call explained here?
  • should there be more detailed progress message updates in crossref.js (eg. item not found on crossref, item found but no references available)?
  • anything else?

Dominic-DallOsto avatar Nov 03 '21 23:11 Dominic-DallOsto

Hi, incredible work on the extension. Thank you so much for it!

I was wondering if this particular feature has seen any progress? It would, in my opinion (or, tbh, for my work), be the most useful addition, since CrossRef is such a big part of the infrastructure and has so much citation data.

Again, thank you, I know I'm just clamoring for features on pro bono work 🙈.

urspx avatar Feb 06 '22 18:02 urspx

Hi, @urspx! @Dominic-DallOsto has probably done a great work to implement this, but unfortunately I haven't had the time to check it yet. I'm slowly starting to catch up on all his contributions. Hopefully I'll manage to review this one soon. In the meantime, maybe he has a beta XPI for you to give it a try?

diegodlh avatar Feb 07 '22 00:02 diegodlh

Amazing, thank you! Looking forward :)

I'm happy to test / report bugs where possible to at least contribute something useful (and, once my schedule clears up a bit, perhaps even help with documentation – I know this is off topic, though).

urspx avatar Feb 07 '22 12:02 urspx

Thank you, @urspx! All bug reports and enhancement requests are welcome. We'll try to address them as fast as we can.

Regarding testing, we have in mind opening a parallel beta channel to help us test our latest changes. We'll post news about this here: #161

And better documentation would definitely be AMAZING. You can contribute whatever (text, screenshots, video) here: https://www.wikidata.org/wiki/Wikidata:Zotero/Cita#Documentation

diegodlh avatar Feb 07 '22 15:02 diegodlh

I just checked out the branch, built an xpi and tested it and got "undefined" bild bild Also I tried following https://www.zotero.org/support/dev/client_coding/plugin_development#setting_up_a_plugin_development_environment but got no error nor a working plugin :/ I'm on windows and it was unclear about how to exactly format the filename and contents.

dpriskorn avatar Feb 10 '22 21:02 dpriskorn

bild bild

getFromCrossref: function(menuName) {
        // get items selected
        // filter items with doi
        // generate batch call to crossref
        // only add items not available locally yet
        let items = Zotero.getActiveZoteroPane().getSelectedItems();
        Zotero.debug("getFromCrossref: Got this many items: " + 
        items.length);
        for (let item in items) {
            Zotero.debug("getFromCrossref: started the loop");
            let wrapped_item = SourceItemWrapper(item);
            Services.prompt.alert(
                window,
                "DOI found",
                wrapped_item.doi
            );
            Zotero.debug("getFromCrossref: "+
            "Running item.getFromCrossref now with this.doi: " + 
            wrapped_item.doi);
            item.getFromCrossref(wrapped_item.doi);
        }
    },

I tried improving this pull request with the code above, but for some reason I neither get an error nor an alert with the DOI. 🤷‍♂️ My changes in this branch https://github.com/dpriskorn/zotero-cita/tree/priskorn-crossref

dpriskorn avatar Feb 11 '22 10:02 dpriskorn

Thanks, I found the issue there. I'm in the middle of fixing it at the moment.

But it raises the more general question of how to deal with batch actions in the overlay. I guess we should have a different progress message for the batch actions? And would it be best to just call the source item wrapper functions to avoid code duplication? Or to reimplement it for a batch at once?

Dominic-DallOsto avatar Feb 14 '22 16:02 Dominic-DallOsto

Thanks, I found the issue there. I'm in the middle of fixing it at the moment.

Nice! I look forward to seeing the solution.

dpriskorn avatar Feb 15 '22 06:02 dpriskorn

But it raises the more general question of how to deal with batch actions in the overlay. I guess we should have a different progress message for the batch actions? And would it be best to just call the source item wrapper functions to avoid code duplication? Or to reimplement it for a batch at once?

Yes, the other functions seem to take "items" as argument and process them. I think it would be a good idea to be consistent. I don't know how the notification system works currently, but we could limit the functionality for now to only support fetching 1 item from crossref at a time. That would work around the issue, but probably lead to user frustration.

dpriskorn avatar Feb 15 '22 06:02 dpriskorn

Hi, why the PR is not merged on master ? I've tested the code and apart from a pop-up "couldn't get item citations from CrossRef" even though it did get it, everything works fine.

Futur3r avatar Oct 01 '22 10:10 Futur3r

@Dominic-DallOsto Is there a way to easily debug with an IDE like vscode ? I would like to fix the problem

Futur3r avatar Oct 06 '22 19:10 Futur3r

Did you find a way to get the debugging in VSCode to work? I tried once before but didn't manage.

Hi, why the PR is not merged on master ? I've tested the code and apart from a pop-up "couldn't get item citations from CrossRef" even though it did get it, everything works fine.

The only issue is here https://github.com/diegodlh/zotero-cita/blob/128fa832c484c960432bac585f9846e3b7a36952/src/crossref.js#L44 - if an error occurs getting one of the items, all the others are aborted. It's some odd asynchronous Promise and error handling issue I haven't figured out yet.

Dominic-DallOsto avatar Oct 10 '22 22:10 Dominic-DallOsto

No, I didn't find a way. I can't contribute to cita because I find it too difficult to debug anything.

I debugged this part of your code (manually with the JS console of Zotero) and I found (but I might be wrong) that

https://github.com/Dominic-DallOsto/zotero-cita/blob/128fa832c484c960432bac585f9846e3b7a36952/src/crossref.js#L48

isn't defined anywhere.

Futur3r avatar Oct 13 '22 15:10 Futur3r

Ok - I finally got things to a more or less working level. The popups aren't yet properly translated, but I hope the functionality is error-free. You can add citations from Crossref by right clicking the item(s), or from the citations menu for an item. The performance isn't amazing for large numbers of citations, because it currently gets each individually by DOI from Crossref.

Any feedback is more than welcome, thanks! @urspx @dpriskorn @Futur3r

Here's a debug build (you'll have to unzip it - Github won't let me upload an xpi here). zotero-cita-v0.6.0-beta-crossref.xpi.zip

Dominic-DallOsto avatar Oct 27 '22 23:10 Dominic-DallOsto

Hello, great to see this is working! I've been following this PR (from a distance) for a while, so it's great to see some progress.

I had a question and some comments.

  1. Does the Zotero.HTTP.request function handle rate limits? (I haven't been able to find any API documentation, nor where this function is defined — though I haven't spent that long looking).

    The documentation has some etiquette and rate limits documentation, which aren't implemented in your code but might be implemented upstream (in Zotero.HTTP.request). From the other examples (see below) the User Agent has to be passed manually.

https://github.com/diegodlh/zotero-cita/blob/0508d8432bfd915364ac6923361b65acfd2d3231/src/wikidata.js#L166-L180

  1. I noticed some code quality issues, such as formatting issues (a mix of tabs and spaces, for example) which makes it tricky to review on github, with default settings, and inconsistencies (missing syntax).

  2. Some of the code could be cleaner, making it easier to understand what is going on, and review.


For one example of the latter two points, this method,

	static async getCitations(doi) {
        const JSONResponse = await Crossref.getDOI(doi);

        if (JSONResponse){
            const references = JSONResponse.message.reference
            if (references && references.length > 0){
                const parsedReferences = await Promise.all(references.map(async (reference) => {
                    const parsedReference = await Crossref.parseReferenceItem(reference);
                    return parsedReference;
                }));
                return parsedReferences.filter(Boolean);
            }
            else{
                debug("Item found in Crossref but doesn't contain any references");
            }
        }
        return [];
	}

could be something like

    static async getCitations(doi) {
        const JSONResponse = await Crossref.getDOI(doi);
        if (!JSONResponse) return [];

        const references = JSONResponse.message.reference;
        if (!references.length) {
            debug("Item found in Crossref but doesn't contain any references");
            return [];
        }

        const parsedReferences = await Promise.all(
            references.map((reference) => Crossref.parseReferenceItem(reference))
        );
        return parsedReferences.filter(Boolean);
    }

dbaynard avatar Oct 28 '22 01:10 dbaynard

Thanks for the detailed feedback!

  1. Does the Zotero.HTTP.request function handle rate limits? (I haven't been able to find any API documentation, nor where this function is defined — though I haven't spent that long looking).

It's here. From what I can tell it handles retrying requests if they fail, but not Crossref "you got rate limited" responses. But getting the reference list from Crossref occurs at most once per item. Most of the calls to Crossref actually happen when parsing the references, which is handled by Zotero's Crossref translator. If we start to experience rate limiting errors, I think the solution would have to be implemented there.

I finally got around to adding the user agent string, thanks!

  1. I noticed some code quality issues, such as formatting issues (a mix of tabs and spaces, for example) which makes it tricky to review on github, with default settings, and inconsistencies (missing syntax).

Sorry about the tabs/space mixture - not sure how that slipped through. The main logic is a bit longer / more verbose now (particularly with all the progress messages), but I hope at least clear enough to follow?

In particular, I added a progress message saying how many items have successfully had their references parsed, because this tends to be quite slow if items have 50+ references.

zotero-cita-v0.6.0-beta-crossref.xpi.zip

Dominic-DallOsto avatar Oct 30 '22 23:10 Dominic-DallOsto

Thanks again for the detailed feedback! Most of those changes make sense - otherwise I left a few comments.

Like you say, adding references despite an error would be a nicer user experience, particularly if running this on a large number of items.

Dominic-DallOsto avatar Nov 02 '22 23:11 Dominic-DallOsto

Glad you've found it helpful. Happy to keep up the pace of development on this to get this merged.

I'll figure out how to build the plugin and give it a spin, over the weekend.

(Edit: been delayed on this — I'll give this a go in the week.)

dbaynard avatar Nov 03 '22 18:11 dbaynard

Are you following dev:client coding:plugin development [Zotero Documentation] to test the plugin?

dbaynard avatar Nov 04 '22 01:11 dbaynard

Are you following dev:client coding:plugin development [Zotero Documentation] to test the plugin?

Yeah. The steps here are hopefully easy enough to follow to get it up and running

Dominic-DallOsto avatar Nov 07 '22 21:11 Dominic-DallOsto

I did a quick test using the bottleneck library to rate limit requests with a minimum of 20 ms between each request.

Actually this made the requests both slower and less reliable. I don't know if Zotero.HTTP has some internal rate limiting or how it works, but the requests seem to be sent out over a period of time instead of all at once, even when launching them all at once async.

Dominic-DallOsto avatar Nov 07 '22 22:11 Dominic-DallOsto

I did a quick test using the bottleneck library to rate limit requests with a minimum of 20 ms between each request.

Nice — good job.

the requests seem to be sent out over a period of time instead of all at once, even when launching them all at once async.

That's what I'd expect: it's how the library is designed, no? The docs for minTime say 'How long to wait after launching a job before launching another one.'

There's no rate limiting in Zotero.HTTP — this matches what you see (i.e. a change in behaviour when you add rate limiting). I also found no 429 error handling, which is how crossref requests that clients pause.


So let's not bother with rate limiting, but we should respond to 429 requests by backing off. Seems reasonable?

dbaynard avatar Nov 08 '22 21:11 dbaynard

That's what I'd expect: it's how the library is designed, no? The docs for minTime say 'How long to wait after launching a job before launching another one.'

Sorry, that wasn't super clear. With the requests being sent out over a period of time I was referring to even when using Zotero.HTTP, without any rate limiting. I'm not sure if this is some weird quirk with my network setup, but if I try to fetch crossref citations for 1400 items the requests don't all get fired off at once but spaced out over 10-15 seconds.

If I enable the rate limiting, the time for all the requests to be fulfilled increases (fair enough), but a number of requests also fail to get a response.

I guess in the ideal case Zotero would add some 429 handling upstream and we wouldn't have to worry about it? But I'm yet to be able to actually get a 429 response despite my best efforts...

Dominic-DallOsto avatar Nov 12 '22 17:11 Dominic-DallOsto

@Dominic-DallOsto Had to step away from this for a bit — if you can add some handling for the 429 case then this will be fine (for now) and better than it was, and you should ping the maintainer to check it over.

We want to be good citizens, and that means handling 429s in every version that is released. Our counterparty might turn on 429 responses in the future. We can't force upgrades.

Thanks for your work on this; I think it will be a really useful feature, when merged.

I'm unlikely to get back to this for at least a month — hopefully this will be merged by then!

dbaynard avatar Nov 29 '22 17:11 dbaynard

I dug into this a bit more and it's not really trivial to handle 429 responses where they're more likely to occur (when we use Zotero translators to parse the reference lists we get). I'm not sure whether we can even identify this case, or whether it will just appear to us as if the translator failed to find any item.

I added 429 handling for when we directly access the Crossref API, but it just throws and error that is logged and caught.

Dominic-DallOsto avatar Jan 06 '23 23:01 Dominic-DallOsto