jabref arXiv fetcher: import more information using DOI

trafficstars

Is your suggestion for improvement related to a problem? Please describe. When importing the reference of a publication using its arXiv number (i.e. copy-paste of the arXiv number on the entry table):

the provided arXiv number is not stored
less information is imported, compared to using the DOI of the arXiv publication.

Describe the solution you'd like Import the reference using the DOI. It is straightforward to get the DOI from the arXiv number: e.g. the publication with the arXiv ID 1811.10364 has the DOI 10.48550/arXiv.1811.10364. See https://arxiv.org/abs/1811.10364

Aug 25 '22 12:08 mlep

From what I've seen from the arXiv importer and the usual API responses, the extraction of the DOI from the arXiv ID is only trivial when the author provides it.

The displayed DOI on their site may actually be one generated from DataCite (see image below), in which case this data seems not to be transmitted into the API call response made in code (for example, http://export.arxiv.org/api/query?id_list=1811.10364). According to the API manual, this would appear in the form of the arxiv:doi element (which would be present if the author had included, as mentioned in here), or as link element (see manual), but this does not seem to be the case with some entries (i.e. arXiv ID 1811.10364)

Considering that, this feature would either only work on entries where the DOI was provided with the API response, or JabRef could try getting this missing info from other methods (web scrapping, use of other APIs like DataCite, match against other archives, etc.)

Please correct me if this was a false conclusion, as I am still not very knowledgeable at most of the codebase.

Sep 16 '22 04:09 thiagocferr

I am not sure that I got you right.

When the user provides the arXiv ID (e.g. 1811.10364), the user also provides, in fact, the DOI. This is because you just need to add the prefix 10.48550/arXiv. to the arXiv ID to get the DOI (10.48550/arXiv.1811.10364). So, JabRef, when provided with an arXiv ID could use the arXiv fetcher, but also the DOI fetcher.

Sep 16 '22 07:09 mlep

What I don't get, using your example, is how you could possibly know to add the prefix 10.48550/arXiv. (more specifically, 10.48550 part) if the only provided information is the arXiv ID (1811.10364). From my understanding, this would only be known if the arXiv fetcher could get the DOI on the same request, which not always does, as shown before...

Sep 19 '22 13:09 thiagocferr

You could possibly add the prefix 10.48550/arXiv because you have identified that 1811.10364 is an arXiv ID.

And, currently, JabRef is already able to identify if a string is an arXiv ID: by simply pasting 1811.10364 in the entry table, JabRef is able to determine that this string is an arXiv ID ("Found arxiv identifier in clipboard" is written in the log file).

Sep 19 '22 14:09 mlep

I think I get it now. After a quick search, I found from this article that, indeed, all arXiv articles have a DOI with the same prefix, which I never really paid attention to :sweat_smile:.

Sep 19 '22 14:09 thiagocferr

Well, I've tackled a bit of code around this functionality and I have an idea on how to implement it, so I'd like to contribute to it as my first issue on this repo.

Sep 22 '22 15:09 thiagocferr

@thiagocferr That sounds great! If you have any questions you can ask them in your pr then Make sure to follow our contribution guide! https://devdocs.jabref.org/contributing.html#contribute-code

Sep 22 '22 17:09 Siedlerchr

jabref jabref copied to clipboard

arXiv fetcher: import more information using DOI

jabref
jabref copied to clipboard