jabref
jabref copied to clipboard
arXiv fetcher: import more information using DOI
Is your suggestion for improvement related to a problem? Please describe. When importing the reference of a publication using its arXiv number (i.e. copy-paste of the arXiv number on the entry table):
- the provided arXiv number is not stored
- less information is imported, compared to using the DOI of the arXiv publication.
Describe the solution you'd like
Import the reference using the DOI.
It is straightforward to get the DOI from the arXiv number: e.g. the publication with the arXiv ID 1811.10364
has the DOI 10.48550/arXiv.1811.10364
. See https://arxiv.org/abs/1811.10364
From what I've seen from the arXiv importer and the usual API responses, the extraction of the DOI from the arXiv ID is only trivial when the author provides it.
The displayed DOI on their site may actually be one generated from DataCite (see image below), in which case this data seems not to be transmitted into the API call response made in code (for example, http://export.arxiv.org/api/query?id_list=1811.10364). According to the API manual, this would appear in the form of the arxiv:doi
element (which would be present if the author had included, as mentioned in here), or as link
element (see manual), but this does not seem to be the case with some entries (i.e. arXiv ID 1811.10364
)
Considering that, this feature would either only work on entries where the DOI was provided with the API response, or JabRef could try getting this missing info from other methods (web scrapping, use of other APIs like DataCite, match against other archives, etc.)
Please correct me if this was a false conclusion, as I am still not very knowledgeable at most of the codebase.
I am not sure that I got you right.
When the user provides the arXiv ID (e.g. 1811.10364
), the user also provides, in fact, the DOI. This is because you just need to add the prefix 10.48550/arXiv.
to the arXiv ID to get the DOI (10.48550/arXiv.1811.10364
). So, JabRef, when provided with an arXiv ID could use the arXiv fetcher, but also the DOI fetcher.
What I don't get, using your example, is how you could possibly know to add the prefix 10.48550/arXiv.
(more specifically, 10.48550
part) if the only provided information is the arXiv ID (1811.10364
). From my understanding, this would only be known if the arXiv fetcher could get the DOI on the same request, which not always does, as shown before...
You could possibly add the prefix 10.48550/arXiv
because you have identified that 1811.10364
is an arXiv ID.
And, currently, JabRef is already able to identify if a string is an arXiv ID: by simply pasting 1811.10364
in the entry table, JabRef is able to determine that this string is an arXiv ID ("Found arxiv identifier in clipboard" is written in the log file).
I think I get it now. After a quick search, I found from this article that, indeed, all arXiv articles have a DOI with the same prefix, which I never really paid attention to :sweat_smile:.
Well, I've tackled a bit of code around this functionality and I have an idea on how to implement it, so I'd like to contribute to it as my first issue on this repo.
@thiagocferr That sounds great! If you have any questions you can ask them in your pr then Make sure to follow our contribution guide! https://devdocs.jabref.org/contributing.html#contribute-code