h icon indicating copy to clipboard operation
h copied to clipboard

meta name="dc.identifier" content="https://dx.doi.org/..." yields unexpected equivalence result

Open judell opened this issue 6 years ago • 0 comments

Here is a set of equivalences created using <meta name="dc.identifier" content="10.1000/ee9"> and <meta name="dc.identifier" content="doi:10.1000/ee9">

image

And here is the set created using <meta name="dc.identifier" content="https://dx.doi.org/10.1000/ee9">

image

In both cases, the dc.identifier value matches this server-side pattern:

https://github.com/hypothesis/h/blob/682764d8bbf46c9d8045162493b777484069fe57/h/util/document_claims.py#L28.

But the DOI-style URI generated in the second case doesn't match the one generated in the first case, and we end up with two disjoint sets of annotations.

We currently have ~15K document_uri records like doi:10.1000/...' and ~2K like doi:http(s)://dx.doi.org/10.1000/...`

This likely isn't much of a problem because most publishers asserting DOIs use both the Highwire and DC syntaxes. But it's something to be aware of.

judell avatar Oct 10 '18 17:10 judell