epubcheck icon indicating copy to clipboard operation
epubcheck copied to clipboard

EPUBCheck not reporting the xmlns:epub error

Open JackieFei opened this issue 10 months ago • 2 comments

Hi,

download html_xmlns_epub_error.zip, rename to html_xmlns_epub_error.epub

The Section0001.xhtml

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html>

<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://http://www.idpf.org/2007/ops">
<head>
  <title>title</title>
</head>

<body>
  <p>&#160;</p>
</body>
</html>

Opened in browser Image

But epubcheck 5.2.1 is all pass

JackieFei avatar Feb 25 '25 03:02 JackieFei

Thanks for the report @JackieFei.

TL;DR I do not believe EPUBCheck is wrong, but it could or should report that case as an informative message. See details below.

So, to summarize the conformance requirements:

  • the specification only says (indirectly) that XHTML content documents must be conforming to the Namespaces in XML specification (i.e. must be namespace-well-formed);
  • the Namespaces specification says the namespace declaration mut be a URI conforming to URI Generic Syntax [RFC3986];
  • the EPUB specification also refers to the URL Standard, which obsoletes and replaces RFC3986;
  • the EPUB specification does not require that the epub prefix is attached to the well-known idpf.org EPUB URL.

Now, looking at the specific URL used in the namespace declaration:

http://http://www.idpf.org/2007/ops

Despite looking like an obvious typo to a human reader, this string is a valid URL as defined in the URL Standard. It parses without validation errors. Its components are:

  • href: http://http//www.idpf.org/2007/ops
  • protocol: http:
  • port: (empty string)
  • hostname: http
  • pathname: //www.idpf.org/2007/ops
  • origin: http://http

So the repetition of http: is parsed as the host name being http, with an empty port specification.

Now, there's a little ambiguity in that RFC3986 says in its 3.2 Authority section that:

URI producers and normalizers should omit the ":" delimiter that separates host from port if the port component is empty

but it's only a should, it does not theoretically make the URL string strictly non-conforming.

As for browsers, it appears not all of them report a parsing error. For instance, Firefox v135.0.1 (tested on macOS) does not. But latest Chrome and Safari do. I assume the implementations might date back to RFC3986 (when the URL standard did not exist) and might use regex matching or naive parsing that is a little too strict. All browsers do parse the URL correctly with no errors using the javascript URL API.

Anyways, all that said, it is quite obvious for a human reader that the URL string is not what the author intended. While I do not think EPUBCheck should report an error, it could definitely be worth reporting that as an informative message, so that authors can catch and fix it.

What I'm currently not sure is when exactly to report such a message. I'm leaning toward reporting only when the prefix epub is not declaring the well-known idpf.org URL. That would cover this particular case, but not the case where a similar http: repetition would occur in a custom namespace.

@mattgarrish thoughts?

rdeltour avatar Feb 25 '25 13:02 rdeltour

Ya, this is a bit of an oddball situation that validly skirts the requirements of the specification.

If it wasn't for custom attributes, it would be a clear error. But because we allow any attributes that aren't defined in the w3c.org or idpf.org domains, that this results in the attributes being under the host "http" makes the use valid, at least as I understand it.

We could maybe make a rule in the spec that says that the epub prefix must only be used with the OPS namespace, but this feels like a really weird one-off situation and it could have unintended consequences since the xml prefix name is only a common convention and could be anything at all.

I don't have any problem if you want to emit an informative message for this kind of case. There's no harm in it and informative messages don't have to correlate to spec requirements.

mattgarrish avatar Feb 25 '25 14:02 mattgarrish