bibxml-service icon indicating copy to clipboard operation
bibxml-service copied to clipboard

Use more of DOI data returned by Crossref

Open strogonoff opened this issue 2 years ago • 8 comments

Introduction

  • We have functionality that obtains DOI data from Crossref Work API and converts it to a Relaton bibliographic item: https://github.com/ietf-ribose/bibxml-service/blob/main/doi/crossref.py
    • This is used by the DOI lookup widget, as well as by /public/rfc/bibxml7/... paths.
  • doi2ietf-py is a Python library, based on doilit Ruby library, that converts DOI data (from obsolete Crossref DOI API) to xml2rfc format: https://github.com/ietf-ribose/doi2ietf-py

Relaton bibliographic item specification supports a wider scope of bibliographic data than the xml2rfc format, such as ISBN etc. As a result, BibXML service overall is expected to use more DOI data than doi2ietf-py/doilit.

Challenge

Some DOI data is still ignored by BibXML service, but is recognized by doi2ietf-py/doilit.

We could investigate doi2ietf-py (and optionally doilit) source and use that DOI data appropriately when constructing a Relaton bibliographic item. For example, I wonder if we could use volume, journal-issue and page, which we currently seem to ignore.

Key pointers

  • Crossref DOI API spec: http://api.crossref.org/swagger-ui/index.html
    • Spec may be incorrect, but one can try different DOIs (discoverable e.g. via bib.ietf.org) and inspect API responses directly, such as https://api.crossref.org/v1/works/10.3390/fi11030055
    • This is the DOI data that we get.
    • doi2ietf-py/doilit use a different API endpoint (https://data.crossref.org/10.3390/fi11030055, now inaccessible), so the data they get could be slightly different. See .json files in doi2ietf-py test fixtures for what they got.
  • Where doi2ietf-py converts DOI structure to xml2rfc: https://github.com/ietf-ribose/doi2ietf-py/blob/6ac3904972ceaeb110e7711a222e93a564aa0250/doi2ietf/utils.py
  • Where we construct Relaton’s bibliographic item from DOI data: https://github.com/ietf-ribose/bibxml-service/blob/main/doi/crossref.py
  • Relaton model spec: LutaML model (see sibling files for referenced class definitions), RNC grammar
    • We want to find a way to correctly use more DOI data when constructing relaton.models.bibdata.BibliographicItem, using doi2ietf-py/doilit as example
    • Bibliographic data model that we have specified in Python so far: relaton.models.bibdata.BibliographicItem (docs)
      • Separate issues can be filed for expanding Relaton model coverage in relaton-py (as well as corresponding updates of BibXML service GUI), if needed to accommodate DOI data. The first step should be to try filling in the models already defined and handled by the GUI.

Optional

  • doi2ietf-py is based on doilit Ruby library

strogonoff avatar May 25 '22 06:05 strogonoff