epubcheck icon indicating copy to clipboard operation
epubcheck copied to clipboard

Allow Internationalization Tag Set attribute in EPUB

Open iherman opened this issue 6 months ago • 8 comments

The International Tag Set specification defines a number of attributes that are officially usable in HTML. (Essentially, the attributes are of the form its-*.) As they are valid in HTML, they should be valid in EPUB as well (without using namespaces for that purpose).

(Context: I am reviewing the code of my respec->EPUB3 converter tool, and I tested it on the clreq note. The generated epub displays properly on, say, Thorium, but epubcheck rejects it because it heavily uses the its-locale-filter-list attribute. I believe it should not.)

@rdeltour @mattgarrish

iherman avatar Jun 02 '25 09:06 iherman

Thanks for the report. I'm slightly surprised since EPUBCheck should filter custom attributes at parsing time before validating with schemas (if so remember correctly). I'll have a deeper look.

rdeltour avatar Jun 02 '25 09:06 rdeltour

Isn't the problem that they're an extension of HTML that isn't formally recognized by HTML itself? (i.e., the validators have been modified to allow the attributes even though the standard itself doesn't mention them that I can find.)

ITS says its allowed in HTML5 but refers to an old W3C draft.

They aren't prefixed attributes which is the normal way to extend xhtml content documents with custom attributes. And even if they were, we'd still have to extend epub to allow them because you can't use the custom attribute route to add other technologies defined in W3C. Custom attributes are supposed to be for reading system/vendor use.

In other words, to make them valid we really should account for them in the html extension section.

mattgarrish avatar Jun 02 '25 13:06 mattgarrish

Isn't the problem that they're an extension of HTML that isn't formally recognized by HTML itself? (i.e., the validators have been modified to allow the attributes even though the standard itself doesn't mention them that I can find.)

AFAIK, that is the way it goes: if there is a formally recognized standard, and it is put into the validator, then that is it. It is officially recognized, and the HTML WG does not do anything else. Remember when we discussed with @sideshowbarker on the HTML alternative of epub:type, and we got to epub-type as the way to go? It is the same.

@sideshowbarker is this correct?

In other words, to make them valid we really should account for them in the html extension section.

I have no problem listing it there explicitly, if this helps. I am actually surprised it did not come up earlier during the i18n review.

But it may not be a very scalable approach: new attributes may come up in other recommendations, and the expectation would be soon thereafter to allow it in EPUB as well...


B.t.w. aria-* are not fully specified either. The HTML Standard refers to a number of them:

In addition, the following aria-* content attributes are defined in ARIA: [ARIA]

aria-checked aria-describedby aria-disabled aria-label

but, afaik, there much more...

iherman avatar Jun 02 '25 14:06 iherman

Remember when we discussed with @sideshowbarker on the HTML alternative of epub:type, and we got to epub-type as the way to go?

Right, but that would be to make it valid for EPUB. I didn't take that to mean that it made it valid for HTML generally.

It's an extension, like RDFa attributes are not formally part of the HTML spec but will validate. Some people will recognize your extension, some won't. With EPUB, we've always formalized extensions by listing them in that section first.

mattgarrish avatar Jun 02 '25 14:06 mattgarrish

It's an extension, like RDFa attributes are not formally part of the HTML spec but will validate. Some people will recognize your extension, some won't. With EPUB, we've always formalized extensions by listing them in that section first.

Ok, I do not see any problem adding that to the spec. Would you raise an issue or should I?

And what about aria, b.t.w.? Shouldn't we list that as acceptable attributes as well?

iherman avatar Jun 02 '25 14:06 iherman

but, afaik, there much more...

That's not the definitive list, though. Each element defines what it allows through its accessibility considerations section. Those lead you to the HTML in ARIA document. But the key for me is that the HTML spec recognizes the various ARIA documents within its own text; it's not relying on extension through validation.

mattgarrish avatar Jun 02 '25 14:06 mattgarrish

Would you raise an issue or should I?

You've raised it here so feel free to open the spec issue, too... 😄

Shouldn't we list that as acceptable attributes as well?

No, like I mentioned in my last comment I think authoring is already well covered through HTML in ARIA references and the HTMLAAM references cover user agent support.

mattgarrish avatar Jun 02 '25 14:06 mattgarrish

Would you raise an issue or should I?

You've raised it here so feel free to open the spec issue, too... 😄

Done: https://github.com/w3c/epub-specs/issues/2732

I do not know how epubcheck handles the issues in general. I would think it is worth keeping this one open; in the case the WG agrees to add the its-* attributes to EPUB 3.4, this issue may be used to track its implementation in epubcheck...

iherman avatar Jun 03 '25 10:06 iherman

I've updated EPUBCHeck to allow its-* attributes (by removing them when pre-processing the XML before sending to the validators).

It's a naïve temporary solution, but that proactively removes false-positives.

I'm keeping this issue open as a reminder to better handle these attributes when properly implementing EPUB 3.4.

rdeltour avatar Sep 01 '25 15:09 rdeltour