biblio-glutton icon indicating copy to clipboard operation
biblio-glutton copied to clipboard

Extension: Extend Biblio-glutton to DBLP

Open cverluise opened this issue 4 years ago • 0 comments

Hello @here,

Thanks a lot for the great tool.

Some months ago, we (with @ste210) started investigating the idea of extending Crossref with the DBLP dataset as part of the PatCit project.

After some time exchanging, here are the main findings (see full discussion thread here):

  • the DBLP dataset (w/o theses) contains 4,777,622 docs
  • 3,900,859 of these docs have a DOI (81.7%)
  • 3,520,018 of these DOIs are also in the CrossRef Database (90%)

It leaves a good number of relevant publications (based on conference rank) which are not covered by CrossRef but which have high quality bibliographical references from DBLP (see breakdown here)

At this point, my idea was to:

  1. take the subset of documents which are in DBLP but not in CrossRef
  2. map the DBLP xml objects to the crossref jsonl format - for the restriction of attributes used by biblio-glutton in the matching process
  3. append the DBLP data (properly formated) to the Crossref database
  4. there we go

I know that biblio-glutton was thought to be DOI-centric. That being said, the DOI is mainly used to harvest extra data from PubMed, Unpaywall, etc right? So, for the bibliographical references in the DBLP which have no DOI, we could replace the DOI value by the DBLP unique identifier. This is not very pretty but it could do the work right?

I might miss the complexity due to the internal functioning of biblio-glutton, so, let me know if you think that this is unrealistic ;)

If it sounds reasonable, I'll be happy to share the code/feedback on the hack here and on PatCit.

Thanks in advance,

Cyril

cverluise avatar Jun 11 '20 09:06 cverluise