epubcheck icon indicating copy to clipboard operation
epubcheck copied to clipboard

Regression in v4.1.1: failed to detect absolute file path in manifest.item.href

Open unext-wendong opened this issue 4 years ago • 5 comments
trafficstars

Sample file: sample.epub.zip

Problem description: Since v4.1.1, for manifest items with an absolute file path as href in the ODF file, the tool fails to detect it as an error, while it does in v4.1.0 (and earlier versions, e.g. v3.0.1).

Example item description:

<item id="p-cov_h1" href="/xhtml/cov_h1.xhtml" media-type="application/xhtml+xml" />

Output from v4.1.1

Validating using EPUB version 3.0.1 rules.
No errors or warnings detected.
Messages: 0 fatal / 0 errors / 0 warnings / 0 info

EPUBCheck completed

Output from v4.1.0

Validating using EPUB version 3.0.1 rules.
ERROR(RSC-001): ./sample.epub(-1,-1): File 'OEBPS//xhtml/cov_h1.xhtml' could not be found.
ERROR(RSC-001): ./sample.epub(-1,-1): File 'OEBPS//xhtml/cov_h2.xhtml' could not be found.
WARNING(OPF-003): ./sample.epub(-1,-1): Item 'OEBPS/xhtml/cov_h1.xhtml' exists in the EPUB, but is not declared in the OPF manifest.
WARNING(OPF-003): ./sample.epub(-1,-1): Item 'OEBPS/xhtml/cov_h2.xhtml' exists in the EPUB, but is not declared in the OPF manifest.

Check finished with errors
Messages: 0 fatal / 2 errors / 2 warnings / 0 info

epubcheck completed

unext-wendong avatar May 25 '21 10:05 unext-wendong

The directory structure in the above sample epub file is as follows:

├── META-INF
│   └── container.xml
├── OEBPS
│   ├── content.opf
│   ├── toc.xhtml
│   └── xhtml
│       ├── cov_h1.xhtml
│       └── cov_h2.xhtml
└── mimetype

The absolute file path in the problem href is /xhtml/cov_h1.xhtml which is clearly not correct.

unext-wendong avatar May 25 '21 10:05 unext-wendong

Thanks for the report @unext-wendong, that regression is definitely something to look into!

I'm not quite sure how to deal with it however.

Technically, /xhtml/cov_h1.xhtml is a relative URL (it has no scheme part). It is allowed in EPUB, and must be resolved relative to the URL of the Package Document.

That resolution is under-specified however: it depends on the URL of the Package Document, which is defined by the user agent (the reading system, or in this case, EPUBCheck).

That said, both the passing result of v4.2.x and the error message in v4.1.0 look erroneous to me. If the author uses a URL starting with /, I would assume the intent is to resolve that relative to the root of the container.

@mattgarrish what do you think? After re-reading the specs (from 3.0.1 to the latest draft of 3.3, for the 100th time?), I believe this resolution is under-specified (results will vary between reading systems).

rdeltour avatar May 25 '21 14:05 rdeltour

If the author uses a URL starting with /, I would assume the intent is to resolve that relative to the root of the container.

Ya, a path starting with a slash is a root-relative path, but I'm not sure if the root is the root of the abstract container or the location of the package document.

It seems like it's possible it could change depending on whether the content gets unzipped or not (i.e., would the root shift if only the content directory gets served).

I seem to remember this being problematic for web publications, too, because the manifest didn't have to correspond to the root of a (sub)domain and so root-relative paths could reference outside the content of the publication.

My gut instinct is we should ban root-relative paths, but it's definitely something for the WG to weigh in on.

mattgarrish avatar May 25 '21 14:05 mattgarrish

Ya, a path starting with a slash is a root-relative path, but I'm not sure if the root is the root of the abstract container or the location of the package document.

It seems like it's possible it could change depending on whether the content gets unzipped or not (i.e., would the root shift if only the content directory gets served).

In fact, my concern is not only for root-relative path, but any path really. I'm thinking we should reopen w3c/epub-specs#1374 . I'll follow up there.

rdeltour avatar May 25 '21 15:05 rdeltour

Path absolute URL string are not recommended in 3.3, so I'm going to add the 3.3 label to this issue rather than start another.

See https://github.com/w3c/epub-specs/issues/1681

mattgarrish avatar Nov 15 '21 13:11 mattgarrish