epub-tests icon indicating copy to clipboard operation
epub-tests copied to clipboard

Errors when ingesting these files

Open rickj opened this issue 2 years ago • 11 comments

When ingesting all of the tests, and using EPUBCheck 5.0.0 beta, these errors were observed: Error File fatal epubcheck errors pkg-version-backward bad XML markup pub-foreign_bad-fallback bad XML markup pub-foreign_json-spine XML parsing aborted pub-xml-external-id fatal epubcheck errors pub-xml-names fatal epubcheck errors pub-xml-non-validating_unclosed

rickj avatar Nov 02 '22 19:11 rickj

Update... these errors are NOT from EPUBCheck 5.

Details: pub-xml-external-id Error parsing XHTML: parsing aborted line: 4 file: epub://paginated.epub/EPUB/content_001.xhtml

pub-foreign_bad-fallback Error parsing XHTML: not well-formed (invalid token) line: 1 file: epub://paginated.epub/EPUB/foo.dmg

pub-foreign_json-spine Error parsing XHTML: not well-formed (invalid token) line: 1 file: epub://paginated.epub/EPUB/moby.json

these are errors specific to the VST ingestion workflow, not EPUBCheck. Essentially if there's a spine item, we expect it to be HTML/XHTML, and if we are trying to parse files like DMG and JSONs in there, these files will never ingest into our system.

Is this expected behavior?

rickj avatar Nov 02 '22 19:11 rickj

Several of the tests involve deliberately-invalid EPUBs in order to test aspects of the specification. Some of the deliberate errors are quite extreme, for example to test what happens in fallback chains. I fully expect some of these files to not work in various reading systems--in fact I hope they don't.

dauwhe avatar Nov 02 '22 19:11 dauwhe

should those we expect to fail be identified in the contributing guide for testing? At least the tester could know to look for other options (like side loading) for testing

rickj avatar Nov 02 '22 20:11 rickj

should those we expect to fail be identified in the contributing guide for testing? At least the tester could know to look for other options (like side loading) for testing

I agree. I would think that a very explicit statement should be present in the 'description' field in the package file (which is then taken over on the test description).

@dauwhe, as far as I could see, these are your tests, could you add such descriptions to the metadata? I am afraid if I do those, I may misunderstand the exact goals of the tests...

iherman avatar Nov 03 '22 10:11 iherman

@dauwhe, if you can't do it I will do something... sometimes in the coming days.

iherman avatar Nov 03 '22 17:11 iherman

@rickj

pub-xml-external-id Error parsing XHTML: parsing aborted line: 4 file: epub://paginated.epub/EPUB/content_001.xhtml

pub-foreign_bad-fallback Error parsing XHTML: not well-formed (invalid token) line: 1 file: epub://paginated.epub/EPUB/foo.dmg

pub-foreign_json-spine Error parsing XHTML: not well-formed (invalid token) line: 1 file: epub://paginated.epub/EPUB/moby.json

I have looked at these. I have created PR #246 adding a remark in the package file of the first two cases, whereby an RS may reject the respective EPUB document altogether. In my view, the RS does pass the test in that case.

I do have a problem with the third case, though. What happens there is that the spine item is a JSON file but a fallback is also specified for it to a bona fide HTML file. In my view, this is a correct EPUB document per spec, but it is based on the assumption that the RS implements the fallback. Put it another way, the RS error message shows that the RS does not implement fallbacks, which means that the test should fail.

Cc-ing the other testers here, too, to check whether my view is correct...

cc @mattgarrish @bduga @davemanhall @danielweck @wareid

iherman avatar Nov 04 '22 05:11 iherman

I was also looking at the other three cases (pkg-version-backward, pub-xml-names, and pub-xml-non-validating_unclosed): the epubcheck errors are "intentional", too, in the sense that the EPUB documents are indeed meant to be invalid.

I think this falls under a situation we have already discussed elsewhere; these tests are aimed at RS-s that allow sideloading, and they should react in some proper way to those errors. On the other hand, it is understood that some RS-s are solely based on an ingestion flow that includes epubcheck; this is one of the situations for which we introduced lately the "n/a" (for "non-applicable") case in the test report. I guess this is exactly your case for those tests, @rickj.

iherman avatar Nov 04 '22 05:11 iherman

Finishing up our testing, and applying 'n/a' for files that fail epubcheck, however, for files that pass epubcheck but do not open in a reading system, is 'n/a' the appropriate test result also?

rickj avatar Nov 30 '22 15:11 rickj

Are these the cases when the test relies on fallback but they do not open because fallback is not implemented? If that is the case then, I am afraid, the tests fail due to missing functionalities...

iherman avatar Nov 30 '22 15:11 iherman

Are these the cases when the test relies on fallback but they do not open because fallback is not implemented? If that is the case then, I am afraid, the tests fail due to missing functionalities...

Possibly. We will do more testing for these as side loaded and update where needed. Right now not passing epubcheck prevents them from even getting to the reading system

rickj avatar Dec 03 '22 14:12 rickj

Are these the cases when the test relies on fallback but they do not open because fallback is not implemented? If that is the case then, I am afraid, the tests fail due to missing functionalities...

Possibly. We will do more testing for these as side loaded and update where needed. Right now not passing epubcheck prevents them from even getting to the reading system

Ah. That was not clear to me. Based on what you told before, that would mean a "n/a" then, because your implementation does not have side loading and everything is filtered through epubcheck...

iherman avatar Dec 04 '22 10:12 iherman