architecture icon indicating copy to clipboard operation
architecture copied to clipboard

Parsing of the unique identifier's scheme

Open mickael-menu-mantano opened this issue 6 years ago • 5 comments

In this document, it's mentioned that we should parse the scheme of the unique identifier.

But there's no place to put it in the RWPM model, I guess we have to expand the scheme in the URI (eg. <dc:identifier opf:scheme="ISBN">123456789X</dc:identifier> -> urn:isbn:123456789X).

But there's no explanation on how to do that on the doc. Any clues?

mickael-menu-mantano avatar Jun 05 '19 13:06 mickael-menu-mantano

It goes into https://readium.org/webpub-manifest/contexts/default/#identifier

Since it's a URI, you have to convert the ISBN/UUID/DO into a URI.

HadrienGardeur avatar Jun 05 '19 14:06 HadrienGardeur

And just for the sake of completeness:

https://w3c.github.io/publ-epub-revision/epub32/spec/epub-packages.html#sec-opf-dcidentifier

This specification imposes no additional restrictions or the requirements of the identifier except that it MUST be at least one character in length after white space has been trimmed. It is strongly encouraged that the identifier be a fully qualified URI, however.

danielweck avatar Jun 05 '19 14:06 danielweck

Sorry, maybe my question was ambiguous.

I'm looking for instructions on how to convert an identifier into a URI, when it's not already one.

Taking this example from the specification:

<metadata xmlns:dc="http://purl.org/dc/elements/1.1/">
    <dc:identifier id="pub-id">urn:doi:10.1016/j.iheduc.2008.03.001</dc:identifier>
    <meta refines="#pub-id" property="identifier-type" scheme="onix:codelist5">06</meta>
</metadata>

and let's assume that the identifier is missing the scheme, so: 10.1016/j.iheduc.2008.03.001. How do I know how to convert it to a URI?

I could look at the scheme onix:codelist5 and hard-code the list of codes (https://ns.editeur.org/onix/en/5). But then:

  • How do I know what's the correct URI scheme for each ONIX code (eg. "BNF Control number")
  • What if another scheme type is declared, how do I know the actual scheme to use for the URI?

mickael-menu-mantano avatar Jun 06 '19 09:06 mickael-menu-mantano

@mmenu-mantano’s questions kinda rang a bell to me as I’ve had a quick convo on Twitter about identifiers a few months ago.

Looks like at some point in time epubcheck didn’t report faulty URIs so if say you used InDesign’s panel export for metadata, and replaced the uuid with an ISBN, InDesign would do the following:

urn:uuid:xxxxxxxxxxxxx

With xxxxxxxxxxxxx being an ISBN. So I guess we can’t even trust the identifier already being an URI.

And you would have that in EPUB files going unnoticed.

¯\(ツ)/¯

Which also leads to this issue in EPUB revision: https://github.com/w3c/publ-epub-revision/issues/1216

JayPanoz avatar Jun 06 '19 10:06 JayPanoz

This feels like an issue that we should address in the architecture repo before we tackle it in various implementations.

We'll encounter this issue with at least a few formats were the identifier is not required or is not necessarily a URI:

  • EPUB
  • PDF
  • CBZ

HadrienGardeur avatar Jan 22 '20 10:01 HadrienGardeur