spec icon indicating copy to clipboard operation
spec copied to clipboard

[spec] Potential accessibility issues with singlehtml due to missing MathML

Open ngzhian opened this issue 3 years ago • 22 comments

I was looking at https://github.com/WebAssembly/spec/blob/941c6f37cb13de20f54b89b73145f802b87b8155/document/core/util/mathjax2katex.py#L117

By default, katex will generate something like:

<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mrow><mn mathvariant="monospace">0</mn><mi mathvariant="monospace">x</mi><mrow><mi mathvariant="monospace">F</mi><mi mathvariant="monospace">C</mi></mrow></mrow><mtext> </mtext><mrow><mn mathvariant="monospace">0</mn><mi mathvariant="monospace">x</mi><mn mathvariant="monospace">11</mn></mrow></mrow><annotation encoding="application/x-tex">\mathrm{{\mathtt{0x{FC}}}~{\mathtt{0x{11}}}}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.61111em;vertical-align:0em;"></span><span class="mord"><span class="mord"><span class="mord"><span class="mord mathtt">0x</span><span class="mord"><span class="mord mathtt">FC</span></span></span></span><span class="mspace nobreak"> </span><span class="mord"><span class="mord"><span class="mord mathtt">0x</span><span class="mord"><span class="mord mathtt">11</span></span></span></span></span></span></span></span></span>

Notice the katex-mathml and <math> DOM nodes. We strip those away by skipping directly to <span class="katex-html">.

We can can actually skip this post-processing with a katex option but it seems like MathML is for accessibility.

I don't now how the singlehtml doc currently is like under, say, a screen reader, and whether the missing MathML is acceptable, but I will go read a bit more about it.

ngzhian avatar Oct 27 '21 20:10 ngzhian

I installed https://chrome.google.com/webstore/detail/screen-reader/kgejglhpjiefppelpmljglcjbhoiplfn (it's in maintenance mode but is still usable, and I clicked on this line, the TTS skipped the math notations, so it said something like "vectors are bounded sequences of the form <> or <>, where the <> can either be values or complex." Where <> denotes a math symbol in the text but TTS did not pick up.

Although this issue mentions the problem on singlehtml, I tried it on the multi-page document and it has the same issue.

On a local build with the code that strips mathml, I'm getting a better TTS: "vectors are bounded sequences of the form an or A asterisk operator, where the A can either be values or complex."

So some quirks to work around.

And I also noticed mathjax is loaded on the singlehtml builds, which seems unnecessary since we already use katex to render all the math?

ngzhian avatar Oct 27 '21 20:10 ngzhian

I remember once receiving a complaint regarding the inability to read the spec through a screen reader. MathML would be the only proper solution to this.

But we did not actually include MathML for that purpose. MathML support is activated in MathJax by default, and I remember e.g. @lukewagner telling me that he gets better and faster rendering on Firefox by switching to MathML. I'm still hoping that browsers catch up again with the MathML 4/Core effort and we can eventually retire both MathJax and Katex – they are kludges to work around a serious hole in the contemporary Web, and it shows. (*)

FWIW, the singlehtml build was a collection of brave hacks that @flagxor implemented at the time to produce a static document following W3C guide lines. Unfortunately, it has been orphaned ever since, and there still are a number of open issues about rendering bugs that are difficult to fix on our end. If I understood @ericprud correctly at some WG discussion last year, we may not actually need this doc anymore if we have the PDF and tweak the multipage HTML version with suitable front matters. Perhaps we should pick up this discussion again at the next WG meeting.

MathJax being included in the Katex build seems like an oversight.

(*) Rant: Browser vendors' continuous neglect of math on the Web is frustrating. Making the world's knowledge accessible my ass. I suppose you can't sell ads next to math...

rossberg avatar Oct 28 '21 07:10 rossberg

But we did not actually include MathML for that purpose. MathML support is activated in MathJax by default

From what I can tell, in multi-page doc, we emit mathml, then MathJax is loaded via JS which rewrites all the MathML into spans. I'm not sure if MathJax does any browser detection to see if it is supported, and only rewrite for Chrome (and other unsupported browsers), I can probably try loading the page on FireFox to see.

Perhaps we should pick up this discussion again at the next WG meeting.

Sg, in #1165 Ben also mentioned something about using the multi page document. Will be nice if we can do that.

ngzhian avatar Oct 28 '21 16:10 ngzhian

I believe you need to explicitly switch to MathML rendering on FF when viewing the page. But I don't have FF around, so I don't know how it works exactly. In any case, all this is set up by Sphinx. I assume there are some configuration options for it, though.

rossberg avatar Oct 29 '21 09:10 rossberg

On Monday I'll post a pointer to the output of the sphinxToTr, which I hope to replace the one-pager. It basically packages the sphinx output so if it's in sphinx, it's in this (these) doc

ericprud avatar Oct 29 '21 21:10 ericprud

It's one day later than Monday, but here's a snapshot of my plan A for publishing the core spec as a multi-page doc: https://www.w3.org/2021/11/wasm-stage/ That's the output of SphinxToTr.js. There are a two wasm-specific files in that theoretically repurposable tool:

  1. wasm-cfg.yaml - pointer to input (Sphinx-generated docs), output, index page, stuff to stick at the top
  2. wasm-respec.html - leverage respec to build the index page. (I used JSDOM to load the respec, which makes this publication process a single step.)

These files would probably go into the core repo if someone anywhere ever wanted to re-use SphinxToTr for another doc.

[Edit] This still requires a manual copy of _static and searchindex.js, so not quite as single step as I implied above.

ericprud avatar Nov 02 '21 11:11 ericprud

Thanks @ericprud, that looks promising! How is the table of contents generated? That only seems to pick some index sections from the spec's appendix, not the main TOC. (Also, the links do not seem to work, but I assume that's because it's in staging? Logo is broken as well.)

rossberg avatar Nov 02 '21 11:11 rossberg

yeah, forgot the chacl subdirs. may now look promising without having to stretch your generosity so much.

ericprud avatar Nov 02 '21 18:11 ericprud

I'm not familiar with W3C requirements, is there interest in trying to use Sphinx HTML theming system to accomplish what SphinxToTr is doing?

ngzhian avatar Nov 02 '21 19:11 ngzhian

If it can accomplish the same thing with less maintenance, that'd be a win all around. The mono doc was pretty unwieldy, which was my motivation to create SphinxToTr, which does some stuff that it needs to do:

  1. document, resource and editor links
  2. Status of This Document
  3. W3C TR Style
  4. W3C-style TOC

, and some stuff that makes it nice to use:

  1. TOC on all pages
  2. search

ericprud avatar Nov 03 '21 17:11 ericprud

I had a go at trying to so some of this with sphinx/respec:

attempt 1, singlehtml builder with sphinx, then include respec's JS file to generate front-matter and TOC. Problem faced: expanding anchor tags, we have a bunch of <a href="#something"> where it links to a <span id="#something">, and respec is not happy. I think I can workaround this, still trying. (We can workaround it by adding a class self-link to the <a> tags, but there is no way in MathJax to specify this. So we have to wait for MathJax to render the math, then use JS to add the class, then load ReSpect, which is pretty slow overall).

attempt 2, (multi-page) html builder with sphinx, then use toctree to generate the TOC, but can't get numbered TOC because the toctree is not defined to be numbered. We can make 2 toctree in index.rst, and use the ..only directive to only output one toctree, but then the output HTML will contain duplicated TOC tree. It looks like the toctree inside of the templating engine doesn't respect ..only. Also, this attempt requires us to duplicate the front matter inside sphinx layout (respec takes care of that).

Update on attempt 2, filed a bug on Sphinx to see if this is intended behavior (and a potential fix).

Just fyi. Seems pretty fiddly either way.

ngzhian avatar Nov 04 '21 21:11 ngzhian

Attempt 2 sounds encouraging to me, but perhaps that means i fail to grasp the fiddliness of it all. SphinxToTr is also quite fiddly so I'd be happy to work with you to push that logic uphill into Sphinx tools. This would make it less prone to failure when divs get rearranged. I can see numbered TOC being useful outside of our use case. Possibly so would be something that allows you to glue your own front page in.

Am I being too much of a Pollyanna?

ericprud avatar Nov 06 '21 12:11 ericprud

I had a go at attempt 2 again, this is the result: https://www.ngzhian.com/spec/newhtml/ and the changes required are https://github.com/ngzhian/spec/pull/3

Couple of problems:

  • toc on the side is duplicated (can be fixed if https://github.com/sphinx-doc/sphinx/pull/9830 is merged upstream)
  • w3c css rules don't apply to toc, due to missing classes

ngzhian avatar Nov 20 '21 00:11 ngzhian

We could use a much skinnier script than SphinxToTr.js to fix those probs. Do the TOCs work if you manually edit the classes? Can you share the result?

ericprud avatar Nov 20 '21 11:11 ericprud

This looks quite promising!

If the Sphinx folks don't get to merge and release that bug fix anytime soon, we could presumably also hack up a little script that pre-processes index.rst appropriately.

rossberg avatar Nov 22 '21 09:11 rossberg

We could use a much skinnier script than SphinxToTr.js to fix those probs. Do the TOCs work if you manually edit the classes? Can you share the result?

It works pretty well, ptal https://www.ngzhian.com/spec/newhtml/. The number of the TOC is tricky, W3C creates separate spans for the numbers, Sphinx toctree has the numbers together with the title. This means we can't get the correct padding/alignment of the TOC lists, but I think it looks fine.

This looks quite promising!

If the Sphinx folks don't get to merge and release that bug fix anytime soon, we could presumably also hack up a little script that pre-processes index.rst appropriately.

We can probably do some hacks. Right now the closest I have is to have numbered TOC for both the w3c multi-page (demo-ed here), and also the existing multi-page. Without the patch it is impossible to have 2 different top-level TOC, one numbered, one not numbered.

See multi-page https://www.ngzhian.com/spec/core/ See demo of sphinx-based w3c compatible multi-page https://www.ngzhian.com/spec/newhtml/

ngzhian avatar Nov 22 '21 21:11 ngzhian

Hi @ngzhian , Can you explain how you generate https://www.ngzhian.com/spec/newhtml/ so I can try to work it into a publication process? @rossberg , @dschuff , @lukewagner , i'll be in SF 24 March - 2 April if it will help to sit in the same room to work out a publication pipeline.

ericprud avatar Mar 14 '22 22:03 ericprud

The changes are here https://github.com/WebAssembly/spec/pull/1429

Basically I use the 'basic' theme from Sphinx, with custom pages (layout.html and page.html) There's some JS kludge to mess around with the TOC to make the styling look more like the W3C toc.

ngzhian avatar Mar 14 '22 22:03 ngzhian

Btw, one issue with the current W3C doc is that it underlines all links even for single operators in math formulas, which can be really confusing and distracting. Is that still the case with this version? (The newhtml link seems dead, so I can't check.)

rossberg avatar Mar 15 '22 10:03 rossberg

https://www.ngzhian.com/spec/newhtml/ is back up. (sorry my rebase messed up the published pages)

Btw, one issue with the current W3C doc is that it underlines all links even for single operators in math formulas, which can be really confusing and distracting. Is that still the case with this version? (The newhtml link seems dead, so I can't check.)

It doesn't :)

ngzhian avatar Mar 15 '22 16:03 ngzhian

@ngzhian, @ericprud, what's the status of this issue, is it still relevant?

rossberg avatar Aug 04 '22 08:08 rossberg

Not actively working on this.

ngzhian avatar Aug 04 '22 16:08 ngzhian