Strong identifiers should be imported for authors when available
MARC records, Wikisource metadata, and perhaps other sources of metadata often include various strong identifiers for authors (LCCN, VIAF, Wikidata, ISNI, etc) which should a) be imported and b) used for matching author records to disambiguate similar author records.
Per @cdrini:
Unfortunately, I believe this isn't supported by our import endpoint at this time :( So even if we specify them, they would not only not be used to match the author, they would not even be saved onto the author record.
Originally posted by @cdrini in https://github.com/internetarchive/openlibrary/issues/9674#issuecomment-2387015816
@hornc @mekarpeles
Please assign this one to me, it'll be a followup to #9674 !
How does this relate to https://github.com/internetarchive/openlibrary/issues/9448 ?
I think they might be the same? Pinging @cdrini @scottbarnes for opinions!
@Freso I think they overlap significantly (now that the issue number has been corrected from what I received in the email notification).
The main difference is one of emphasis. This issue is focused the major source of reliable bibliographic data, MARC recrds, and on strong identifiers which are backed by national libraries and other large, reliable institutions. The identifiers mentioned in 9448 are Amazon (which is certainly NOT reliable) and LibriVox which is a niche site with a handful of authors (~15K listed, but many without any works). Add a WikiSource import as proposed in #9674 will definitely NOT satisfy the requirements specified here.
MARC+VIAF would be an 80-90% solution. WikiSource + LibriVox would be a <5% solution.
@tfmorris My implementation accounts for every identifier in author/identifiers.yml and not just Wikisource or just the identifiers outlined in #9448, but it depends on future imports adhering to the schema changes I'm proposing. There's no MARC identifier for authors defined in authors/identifiers.yml, though.
This is also only a slight superset of #7724
There's no MARC identifier for authors defined in authors/identifiers.yml, though.
MARC is the standard for bibliographic metadata used by libraries around the world, not an identifier type. It will contain VIAF, LCCN, BNF, GND, ISNI, etc identifiers.
I think #9448 is a pre-requisite for this issue. This issue involves extracting identifiers from MARC records, but currently there is nowhere in the import format to put them.