polar-bookshelf Extended metadata parser APIs for extracting web page contents.

Extended metadata parser APIs for extracting web page contents.

Open burtonator opened this issue 6 years ago • 3 comments

When we're on pages like arxiv with a PDF and whitepaper we often have the full metadata there and don't need to re-parse it from the PDF.

We ALSO (probably) have the PDF import URL.

When the user clicks 'save to polar' we should special case these to find the PDF , pull out the FULL metadata and then add it to polar directly.

Feb 11 '19 15:02 burtonator

Here is an arxiv example:

Page of one paper: https://arxiv.org/abs/1808.02874
Corresponding PDF: https://arxiv.org/pdf/1808.02874
On the right side there are two ways to get the bibtex file.

There is also the Arxiv API: https://arxiv.org/help/api/index

Feb 11 '19 15:02 sotte

Just to add to the bibtex-discussion, please note that there is a newer version of biblatex. Bibtex is an older version. The two are more or less the same, biblatex offering more entries, etc. It would be best, if Polar during its export/import would be able to distinguish between the old bibtex and the new biblatex format. I believe, one of the most straightforward identification is that bibtex for articles uses the journal entry, while biblatex uses journaltitle.

Feb 11 '19 20:02 burtonator

The Zotero translation server exposes Zotero translators as a service. I think it is possible to use it to extract metadata from all the sites Zotero supports.

Jan 01 '21 12:01 kskyten

polar-bookshelf polar-bookshelf copied to clipboard

Extended metadata parser APIs for extracting web page contents.

polar-bookshelf
polar-bookshelf copied to clipboard