epubcheck icon indicating copy to clipboard operation
epubcheck copied to clipboard

Validation issue for ePub type

Open shashikumar1 opened this issue 5 years ago • 8 comments
trafficstars

If in html/xhtml file, below tag available and found type for value of epub:type, new version of ePubchecker not gives an error message but an apple/itune producer reject the title so this need to update in ePubchecker.

If in file below type of indesign element available apple rejects the file but ePubchecker not catch this. The below pattern is out of the process of ePUB creation so this should be catched.

<span-C_ITALICpCITATIONpPARA>31</span-C_ITALICpCITATIONpPARA> <p-P_INDEXMSTRUCTpIndexDivpTITLECleanCSS_2>P</p-P_INDEXMSTRUCTpIndexDivpTITLECleanCSS_2>

shashikumar1 avatar Sep 06 '20 06:09 shashikumar1

Thanks for the report @shashikumar1.

An HTML element with a hyphen in its name is valid HTML, used for custom elements. We could possibly throw an informative message if these elements are found and no Javascript is attached, but I don't think we can otherwise reject the construct.

If the output doesn’t intend to use HTML custom elements, I would suggest to file a bug under the production tool.

I can have a further look if you can share a sample file 😊

rdeltour avatar Sep 07 '20 07:09 rdeltour

I have attached the sample epub and you can see in CH02.xhtml, where below two type of element available. Also instead this i have other concern that is epubtype related, if you see the CH02 and toc.xhtml, I have used wrong ePubtype but ePubchecker not catching error but apple marking this.

span-C_ITALICpCITATIONpPARA>31</span-C_ITALICpCITATIONpPARA span->In Situ</span- section epub:type="cshapter"

sample.zip

shashikumar1 avatar Sep 07 '20 08:09 shashikumar1

Apple could be using an older version of epubcheck still and/or they've added extra rules on their incoming content that don't exist in the standard. If you want to validate the way specific vendors do, you may want to look at a tool like flightdeck. It's not in epubcheck's mandate to match what vendors want.

mattgarrish avatar Sep 07 '20 09:09 mattgarrish

Such custom elements seem only to appear in the working draft HTML5.3 - do they apply to XHTML syntax as well? Until this is not a recommendation, this is nothing to check right now.

It would be more useful to use in XHTML an own namespace for such elements. And for those EPUB already has some acessibility features to avoid problems for the audience.

If almost arbitrary element-names without defined meaning can be used in EPUB in the future, there might be additional requirements useful to use die EPUB type attribute or the (X)HTML RDFa property attribute to indicate, what it might mean.

However, as far as I understand the working draft 5.3, those custom element require a 'definition' with scripting via DOM, this means, a check program always has to check, that the custom element really is defined within the DOM. But because EPUB hast the accessibility requirement for authors not to rely on scripting, they finally should not use custom elements, maybe only elements from another namespace together with a switch.

The sample EPUB from shashikumar1 does not contain a script at all, therefore no 'definition' of the custom element, therefore even using the working draft 5.3 for checking it would be invalid.

Doktorchen avatar Sep 07 '20 10:09 Doktorchen

Such custom elements seem only to appear in the working draft HTML5.3 - do they apply to XHTML syntax as well? Until this is not a recommendation, this is nothing to check right now.

EPUB 3.2 is based on the latest HTML version, defined by in the HTML living standard, which does include custom elements. To the best of my knowledge, they are allowed in the XHTML syntax.

rdeltour avatar Sep 07 '20 10:09 rdeltour

The sample EPUB from shashikumar1 does not contain a script at all, therefore no 'definition' of the custom element, therefore even using the working draft 5.3 for checking it would be invalid.

Yeah, that's why I suggested EPUBCheck could issue an informative message when a document contains custom elements and no javascript. But even then, it's not strictly invalid.

rdeltour avatar Sep 07 '20 10:09 rdeltour

EPUB 3.2 is based on the latest HTML version, defined by in the HTML living standard

Right, W3C have an MOU with whatwg now that the living standard will be the official version going forward. All new W3C specifications now reference it.

mattgarrish avatar Sep 07 '20 11:09 mattgarrish

Once the editors of HTML5 promised not to be backwards incompatible, but indeed, meanwhile there are some incompatibilities, for example the problem with the tfoot element (what can be relevant especially for EPUB presentation programs using a paged display). Following with a check program such an every day changing (instable) document means, that valid books (for example EPUB 3.0 books) become suddenly invalid in the interpretation of the checker. To check correctly, it would be required, that authors reference for each document a dated reference to the specifiaction, they followed. At least, if one uses the official recommendations: https://www.w3.org/TR/2014/REC-html5-20141028/ https://www.w3.org/TR/2017/REC-html51-20171003/ https://www.w3.org/TR/2017/REC-html52-20171214/ there is only a limited number of currently three recommendations, one has to check and books do not change surprisingly the status of validity within time due to changes in specification drafts. And the WHATWG variant is not named a recommendation, what is mentioned in the EPUB Content Documents 3.2. Which recommendation applies as the latest for a checker depends on the the time, an EPUB book authors looks in the EPUB 3.2 specification, respectively what is the latest recommendations such an authors cares about - this is bad design considering, one really wants to check validity with some meaning. ;o)

Finally the WHATWG HTML5 is designed to be not checkable - why to try it at all? Because EPUB requires XML syntax, it should be sufficient to check this, if everything else is matter of change in the WHATWG HTML5.

I observed as well, that some booksellers do not update their epubcheck versions anymore, they stay with 4.0 or 4.1 and reject 4.2 (unfortunately some bugs are only fixed in 4.2), maybe because they want to have a valid test result for already existing books and no change due to WHATWG changes from today, not applicable at all for all these millions ob EPUBs produced in the past years already?

Does it really help to improve epubcheck 4.2, if authors in practice cannot use it, booksellers do not want to use it due to backwards incompatible HTML issues?

The current (2020-09-02) WHATWG document states in 4.13.3 Core concepts: "A custom element is an element that is custom. Informally, this means that its constructor and prototype are defined by the author, instead of by the user agent. This author-supplied constructor function is called the custom element constructor." Therefore an author always has to use such a DOM constructor to define the element, this requires in practice a script to access the DOM. Without such a definition there is no such custom element in a valid document. Even more, usage seems to be only meaningful, if the appearence in the document is not static, but produced with a script within the DOM, not the XML representation. Because the content is always already present in the XML representation per definition due to , all additions or changes due to scripting are only decoration or behaviour, no content. In the OPF file this seems to require the scripted property value for the related item, it might be useful as well, that authors provide a fallback in case, hat accessibility issues could result from the usage of scripting or custom elements - usually this is an indication for bad designed documents. Something like this should be checkable and additionally the checker can provide a warning or information for authors to add such helps for the audience, respectively better to fix the borked document ;o)

Doktorchen avatar Sep 07 '20 12:09 Doktorchen

Closing this as not an issue, since EPUBCheck is conforming to the spec here, as far as I can tell. Feel free to keep on commenting here, or reopen if you disagree.

rdeltour avatar Dec 08 '22 10:12 rdeltour