sphinx icon indicating copy to clipboard operation
sphinx copied to clipboard

genindex: use – instead of – for xhtml compatibility

Open methane opened this issue 1 year ago • 8 comments

Feature or Bugfix

  • Bugfix

Purpose

There is no – character entitiy in xhtml. The epubcheck reports fatal error and epub readers like Books.app crashes.

Detail

Relates

Fix #12359

methane avatar May 19 '24 05:05 methane

I am a bit confused here. AFAICT, ndash is XHTML compliant according to https://www.w3.org/TR/xhtml1/dtds.html#a_dtd_Special_characters. While I do not mind this change, could it be possible that 1) there is an issue with epubcheck? 2) there is an issue without our template where we do not declare correctly the DTD?

picnixz avatar May 21 '24 11:05 picnixz

I am a bit confused here. AFAICT, ndash is XHTML compliant according to https://www.w3.org/TR/xhtml1/dtds.html#a_dtd_Special_characters.

I am not an expert of xhtml too. See this article:

https://en.wikipedia.org/wiki/HTML5#XHTML5_(XML-serialized_HTML5)

XHTML5 is simply XML-serialized HTML5 data (that is, HTML5 constrained to XHTML's strict requirements, e.g., not having any unclosed tags), [snip] There is no DTD for XHTML5.[126]

So XHTML5 is not compatible with XHTML 1.0/1.1. Since there is no DTD, we can not use entities defined in DTDs.

methane avatar May 21 '24 12:05 methane

I find the way to check this without epubcheck. Open genindex-A.xhtml in build/epub directory. Or rename html/genindex-A.html to genindex-A.xhtml and open it.

I can see this error.

image

methane avatar May 21 '24 12:05 methane

So I played a bit with that and indeed there is an issue. However, by replacing <!DOCTYPE html> with

<!DOCTYPE html 
	PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" 
	"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

I do not have issues any more (except that firefox tells me now that there is a stray doctype, which I think is because it's source-code viewer is only for HTML and not XHTML+XML files).

While I personally prefer changing the DOCTYPE to make it a real XHTML+XML file, I am not entirely sure that it should be the correct way to do it for EPUB files (I don't know the specs for them). Also, I assume other files might have this issue.

@chrisjsewell I reopened the issue because it's an issue for me. So either we change the entity using unicode points or we change the doctype (but this would only be for EPUB files I think?)

picnixz avatar May 21 '24 12:05 picnixz

I reopened the issue because it's an issue for me

yeh no problem; I guess my generic question would be, how come there is currently no failure of our CI, and can we add a test/build that does break it?

chrisjsewell avatar May 21 '24 12:05 chrisjsewell

come there is currently no failure of our CI, and can we add a test/build that does break it?

I don't know why there is no failure but I don't know whether epubcheck wouldn't complain then (note that not all files have this ndash entity and so maybe epubcheck is not catching it).

picnixz avatar May 21 '24 13:05 picnixz

come there is currently no failure of our CI, and can we add a test/build that does break it?

I don't know why there is no failure but I don't know whether epubcheck wouldn't complain then (note that not all files have this ndash entity and so maybe epubcheck is not catching it).

html_split_index is false by default. That's why genindex-A.xhtml is not generated.

https://github.com/methane/sphinx/actions/runs/9175879594/job/25229853450

methane avatar May 21 '24 13:05 methane

@chrisjsewell @methane

Here's what I suggest:

  • For now, let's change that simple entity to what you are suggesting.
  • In a separate PR, check if there are some problems in the other EPUB files. For that, we would probably need to see the templates by ourselves... Not sure which flags should be enabled =/
  • Fix them separately, possibly changing the templates + doctypes.

The reason why I want to apply the fix now is to have Sphinx in a state that is "more or less enough" for everyone. If there are issues with other entities, I think people would just report it but for now, we have something that breaks CPython's docs.

picnixz avatar May 22 '24 07:05 picnixz