epubcheck
epubcheck copied to clipboard
Regression in v4.1.1: failed to detect absolute file path in manifest.item.href
Sample file: sample.epub.zip
Problem description: Since v4.1.1, for manifest items with an absolute file path as href in the ODF file, the tool fails to detect it as an error, while it does in v4.1.0 (and earlier versions, e.g. v3.0.1).
Example item description:
<item id="p-cov_h1" href="/xhtml/cov_h1.xhtml" media-type="application/xhtml+xml" />
Output from v4.1.1
Validating using EPUB version 3.0.1 rules.
No errors or warnings detected.
Messages: 0 fatal / 0 errors / 0 warnings / 0 info
EPUBCheck completed
Output from v4.1.0
Validating using EPUB version 3.0.1 rules.
ERROR(RSC-001): ./sample.epub(-1,-1): File 'OEBPS//xhtml/cov_h1.xhtml' could not be found.
ERROR(RSC-001): ./sample.epub(-1,-1): File 'OEBPS//xhtml/cov_h2.xhtml' could not be found.
WARNING(OPF-003): ./sample.epub(-1,-1): Item 'OEBPS/xhtml/cov_h1.xhtml' exists in the EPUB, but is not declared in the OPF manifest.
WARNING(OPF-003): ./sample.epub(-1,-1): Item 'OEBPS/xhtml/cov_h2.xhtml' exists in the EPUB, but is not declared in the OPF manifest.
Check finished with errors
Messages: 0 fatal / 2 errors / 2 warnings / 0 info
epubcheck completed
The directory structure in the above sample epub file is as follows:
├── META-INF
│ └── container.xml
├── OEBPS
│ ├── content.opf
│ ├── toc.xhtml
│ └── xhtml
│ ├── cov_h1.xhtml
│ └── cov_h2.xhtml
└── mimetype
The absolute file path in the problem href is /xhtml/cov_h1.xhtml which is clearly not correct.
Thanks for the report @unext-wendong, that regression is definitely something to look into!
I'm not quite sure how to deal with it however.
Technically, /xhtml/cov_h1.xhtml is a relative URL (it has no scheme part). It is allowed in EPUB, and must be resolved relative to the URL of the Package Document.
That resolution is under-specified however: it depends on the URL of the Package Document, which is defined by the user agent (the reading system, or in this case, EPUBCheck).
That said, both the passing result of v4.2.x and the error message in v4.1.0 look erroneous to me. If the author uses a URL starting with /, I would assume the intent is to resolve that relative to the root of the container.
@mattgarrish what do you think? After re-reading the specs (from 3.0.1 to the latest draft of 3.3, for the 100th time?), I believe this resolution is under-specified (results will vary between reading systems).
If the author uses a URL starting with
/, I would assume the intent is to resolve that relative to the root of the container.
Ya, a path starting with a slash is a root-relative path, but I'm not sure if the root is the root of the abstract container or the location of the package document.
It seems like it's possible it could change depending on whether the content gets unzipped or not (i.e., would the root shift if only the content directory gets served).
I seem to remember this being problematic for web publications, too, because the manifest didn't have to correspond to the root of a (sub)domain and so root-relative paths could reference outside the content of the publication.
My gut instinct is we should ban root-relative paths, but it's definitely something for the WG to weigh in on.
Ya, a path starting with a slash is a root-relative path, but I'm not sure if the root is the root of the abstract container or the location of the package document.
It seems like it's possible it could change depending on whether the content gets unzipped or not (i.e., would the root shift if only the content directory gets served).
In fact, my concern is not only for root-relative path, but any path really. I'm thinking we should reopen w3c/epub-specs#1374 . I'll follow up there.
Path absolute URL string are not recommended in 3.3, so I'm going to add the 3.3 label to this issue rather than start another.
See https://github.com/w3c/epub-specs/issues/1681