polar-bookshelf icon indicating copy to clipboard operation
polar-bookshelf copied to clipboard

Extended metadata parser APIs for extracting web page contents.

Open burtonator opened this issue 6 years ago • 3 comments

When we're on pages like arxiv with a PDF and whitepaper we often have the full metadata there and don't need to re-parse it from the PDF.

We ALSO (probably) have the PDF import URL.

When the user clicks 'save to polar' we should special case these to find the PDF , pull out the FULL metadata and then add it to polar directly.

burtonator avatar Feb 11 '19 15:02 burtonator

Here is an arxiv example:

  • Page of one paper: https://arxiv.org/abs/1808.02874
  • Corresponding PDF: https://arxiv.org/pdf/1808.02874
  • On the right side there are two ways to get the bibtex file.

There is also the Arxiv API: https://arxiv.org/help/api/index

sotte avatar Feb 11 '19 15:02 sotte

Just to add to the bibtex-discussion, please note that there is a newer version of biblatex. Bibtex is an older version. The two are more or less the same, biblatex offering more entries, etc. It would be best, if Polar during its export/import would be able to distinguish between the old bibtex and the new biblatex format. I believe, one of the most straightforward identification is that bibtex for articles uses the journal entry, while biblatex uses journaltitle.

burtonator avatar Feb 11 '19 20:02 burtonator

The Zotero translation server exposes Zotero translators as a service. I think it is possible to use it to extract metadata from all the sites Zotero supports.

kskyten avatar Jan 01 '21 12:01 kskyten