architecture
architecture copied to clipboard
Parsing of the unique identifier's scheme
In this document, it's mentioned that we should parse the scheme of the unique identifier.
But there's no place to put it in the RWPM model, I guess we have to expand the scheme in the URI (eg. <dc:identifier opf:scheme="ISBN">123456789X</dc:identifier> -> urn:isbn:123456789X).
But there's no explanation on how to do that on the doc. Any clues?
It goes into https://readium.org/webpub-manifest/contexts/default/#identifier
Since it's a URI, you have to convert the ISBN/UUID/DO into a URI.
And just for the sake of completeness:
https://w3c.github.io/publ-epub-revision/epub32/spec/epub-packages.html#sec-opf-dcidentifier
This specification imposes no additional restrictions or the requirements of the identifier except that it MUST be at least one character in length after white space has been trimmed. It is strongly encouraged that the identifier be a fully qualified URI, however.
Sorry, maybe my question was ambiguous.
I'm looking for instructions on how to convert an identifier into a URI, when it's not already one.
Taking this example from the specification:
<metadata xmlns:dc="http://purl.org/dc/elements/1.1/">
<dc:identifier id="pub-id">urn:doi:10.1016/j.iheduc.2008.03.001</dc:identifier>
<meta refines="#pub-id" property="identifier-type" scheme="onix:codelist5">06</meta>
</metadata>
and let's assume that the identifier is missing the scheme, so: 10.1016/j.iheduc.2008.03.001. How do I know how to convert it to a URI?
I could look at the scheme onix:codelist5 and hard-code the list of codes (https://ns.editeur.org/onix/en/5). But then:
- How do I know what's the correct URI scheme for each ONIX code (eg. "BNF Control number")
- What if another scheme type is declared, how do I know the actual scheme to use for the URI?
@mmenu-mantano’s questions kinda rang a bell to me as I’ve had a quick convo on Twitter about identifiers a few months ago.
Looks like at some point in time epubcheck didn’t report faulty URIs so if say you used InDesign’s panel export for metadata, and replaced the uuid with an ISBN, InDesign would do the following:
urn:uuid:xxxxxxxxxxxxx
With xxxxxxxxxxxxx being an ISBN. So I guess we can’t even trust the identifier already being an URI.
And you would have that in EPUB files going unnoticed.
¯\(ツ)/¯
Which also leads to this issue in EPUB revision: https://github.com/w3c/publ-epub-revision/issues/1216
This feels like an issue that we should address in the architecture repo before we tackle it in various implementations.
We'll encounter this issue with at least a few formats were the identifier is not required or is not necessarily a URI:
- EPUB
- CBZ