spec
spec copied to clipboard
[spec] Potential accessibility issues with singlehtml due to missing MathML
I was looking at https://github.com/WebAssembly/spec/blob/941c6f37cb13de20f54b89b73145f802b87b8155/document/core/util/mathjax2katex.py#L117
By default, katex will generate something like:
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mrow><mn mathvariant="monospace">0</mn><mi mathvariant="monospace">x</mi><mrow><mi mathvariant="monospace">F</mi><mi mathvariant="monospace">C</mi></mrow></mrow><mtext> </mtext><mrow><mn mathvariant="monospace">0</mn><mi mathvariant="monospace">x</mi><mn mathvariant="monospace">11</mn></mrow></mrow><annotation encoding="application/x-tex">\mathrm{{\mathtt{0x{FC}}}~{\mathtt{0x{11}}}}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.61111em;vertical-align:0em;"></span><span class="mord"><span class="mord"><span class="mord"><span class="mord mathtt">0x</span><span class="mord"><span class="mord mathtt">FC</span></span></span></span><span class="mspace nobreak"> </span><span class="mord"><span class="mord"><span class="mord mathtt">0x</span><span class="mord"><span class="mord mathtt">11</span></span></span></span></span></span></span></span></span>
Notice the katex-mathml
and <math>
DOM nodes. We strip those away by skipping directly to <span class="katex-html">
.
We can can actually skip this post-processing with a katex option but it seems like MathML is for accessibility.
I don't now how the singlehtml doc currently is like under, say, a screen reader, and whether the missing MathML is acceptable, but I will go read a bit more about it.
I installed https://chrome.google.com/webstore/detail/screen-reader/kgejglhpjiefppelpmljglcjbhoiplfn (it's in maintenance mode but is still usable, and I clicked on this line, the TTS skipped the math notations, so it said something like "vectors are bounded sequences of the form <> or <>, where the <> can either be values or complex." Where <> denotes a math symbol in the text but TTS did not pick up.
Although this issue mentions the problem on singlehtml, I tried it on the multi-page document and it has the same issue.
On a local build with the code that strips mathml, I'm getting a better TTS: "vectors are bounded sequences of the form an or A asterisk operator, where the A can either be values or complex."
So some quirks to work around.
And I also noticed mathjax is loaded on the singlehtml builds, which seems unnecessary since we already use katex to render all the math?
I remember once receiving a complaint regarding the inability to read the spec through a screen reader. MathML would be the only proper solution to this.
But we did not actually include MathML for that purpose. MathML support is activated in MathJax by default, and I remember e.g. @lukewagner telling me that he gets better and faster rendering on Firefox by switching to MathML. I'm still hoping that browsers catch up again with the MathML 4/Core effort and we can eventually retire both MathJax and Katex – they are kludges to work around a serious hole in the contemporary Web, and it shows. (*)
FWIW, the singlehtml build was a collection of brave hacks that @flagxor implemented at the time to produce a static document following W3C guide lines. Unfortunately, it has been orphaned ever since, and there still are a number of open issues about rendering bugs that are difficult to fix on our end. If I understood @ericprud correctly at some WG discussion last year, we may not actually need this doc anymore if we have the PDF and tweak the multipage HTML version with suitable front matters. Perhaps we should pick up this discussion again at the next WG meeting.
MathJax being included in the Katex build seems like an oversight.
(*) Rant: Browser vendors' continuous neglect of math on the Web is frustrating. Making the world's knowledge accessible my ass. I suppose you can't sell ads next to math...
But we did not actually include MathML for that purpose. MathML support is activated in MathJax by default
From what I can tell, in multi-page doc, we emit mathml, then MathJax is loaded via JS which rewrites all the MathML into spans. I'm not sure if MathJax does any browser detection to see if it is supported, and only rewrite for Chrome (and other unsupported browsers), I can probably try loading the page on FireFox to see.
Perhaps we should pick up this discussion again at the next WG meeting.
Sg, in #1165 Ben also mentioned something about using the multi page document. Will be nice if we can do that.
I believe you need to explicitly switch to MathML rendering on FF when viewing the page. But I don't have FF around, so I don't know how it works exactly. In any case, all this is set up by Sphinx. I assume there are some configuration options for it, though.
On Monday I'll post a pointer to the output of the sphinxToTr, which I hope to replace the one-pager. It basically packages the sphinx output so if it's in sphinx, it's in this (these) doc
It's one day later than Monday, but here's a snapshot of my plan A for publishing the core spec as a multi-page doc: https://www.w3.org/2021/11/wasm-stage/ That's the output of SphinxToTr.js. There are a two wasm-specific files in that theoretically repurposable tool:
- wasm-cfg.yaml - pointer to input (Sphinx-generated docs), output, index page, stuff to stick at the top
- wasm-respec.html - leverage respec to build the index page. (I used JSDOM to load the respec, which makes this publication process a single step.)
These files would probably go into the core repo if someone anywhere ever wanted to re-use SphinxToTr for another doc.
[Edit] This still requires a manual copy of _static and searchindex.js, so not quite as single step as I implied above.
Thanks @ericprud, that looks promising! How is the table of contents generated? That only seems to pick some index sections from the spec's appendix, not the main TOC. (Also, the links do not seem to work, but I assume that's because it's in staging? Logo is broken as well.)
yeah, forgot the chacl subdirs. may now look promising without having to stretch your generosity so much.
I'm not familiar with W3C requirements, is there interest in trying to use Sphinx HTML theming system to accomplish what SphinxToTr is doing?
If it can accomplish the same thing with less maintenance, that'd be a win all around. The mono doc was pretty unwieldy, which was my motivation to create SphinxToTr, which does some stuff that it needs to do:
- document, resource and editor links
- Status of This Document
- W3C TR Style
- W3C-style TOC
, and some stuff that makes it nice to use:
- TOC on all pages
- search
I had a go at trying to so some of this with sphinx/respec:
attempt 1, singlehtml builder with sphinx, then include respec's JS file to generate front-matter and TOC. Problem faced: expanding anchor tags, we have a bunch of <a href="#something">
where it links to a <span id="#something">
, and respec is not happy. I think I can workaround this, still trying. (We can workaround it by adding a class self-link
to the <a>
tags, but there is no way in MathJax to specify this. So we have to wait for MathJax to render the math, then use JS to add the class, then load ReSpect, which is pretty slow overall).
attempt 2, (multi-page) html builder with sphinx, then use toctree to generate the TOC, but can't get numbered TOC because the toctree is not defined to be numbered. We can make 2 toctree in index.rst, and use the ..only
directive to only output one toctree, but then the output HTML will contain duplicated TOC tree. It looks like the toctree inside of the templating engine doesn't respect ..only
. Also, this attempt requires us to duplicate the front matter inside sphinx layout (respec takes care of that).
Update on attempt 2, filed a bug on Sphinx to see if this is intended behavior (and a potential fix).
Just fyi. Seems pretty fiddly either way.
Attempt 2 sounds encouraging to me, but perhaps that means i fail to grasp the fiddliness of it all. SphinxToTr is also quite fiddly so I'd be happy to work with you to push that logic uphill into Sphinx tools. This would make it less prone to failure when divs get rearranged. I can see numbered TOC being useful outside of our use case. Possibly so would be something that allows you to glue your own front page in.
Am I being too much of a Pollyanna?
I had a go at attempt 2 again, this is the result: https://www.ngzhian.com/spec/newhtml/ and the changes required are https://github.com/ngzhian/spec/pull/3
Couple of problems:
- toc on the side is duplicated (can be fixed if https://github.com/sphinx-doc/sphinx/pull/9830 is merged upstream)
- w3c css rules don't apply to toc, due to missing classes
We could use a much skinnier script than SphinxToTr.js to fix those probs. Do the TOCs work if you manually edit the classes? Can you share the result?
This looks quite promising!
If the Sphinx folks don't get to merge and release that bug fix anytime soon, we could presumably also hack up a little script that pre-processes index.rst appropriately.
We could use a much skinnier script than SphinxToTr.js to fix those probs. Do the TOCs work if you manually edit the classes? Can you share the result?
It works pretty well, ptal https://www.ngzhian.com/spec/newhtml/. The number of the TOC is tricky, W3C creates separate spans for the numbers, Sphinx toctree has the numbers together with the title. This means we can't get the correct padding/alignment of the TOC lists, but I think it looks fine.
This looks quite promising!
If the Sphinx folks don't get to merge and release that bug fix anytime soon, we could presumably also hack up a little script that pre-processes index.rst appropriately.
We can probably do some hacks. Right now the closest I have is to have numbered TOC for both the w3c multi-page (demo-ed here), and also the existing multi-page. Without the patch it is impossible to have 2 different top-level TOC, one numbered, one not numbered.
See multi-page https://www.ngzhian.com/spec/core/ See demo of sphinx-based w3c compatible multi-page https://www.ngzhian.com/spec/newhtml/
Hi @ngzhian , Can you explain how you generate https://www.ngzhian.com/spec/newhtml/ so I can try to work it into a publication process? @rossberg , @dschuff , @lukewagner , i'll be in SF 24 March - 2 April if it will help to sit in the same room to work out a publication pipeline.
The changes are here https://github.com/WebAssembly/spec/pull/1429
Basically I use the 'basic' theme from Sphinx, with custom pages (layout.html and page.html) There's some JS kludge to mess around with the TOC to make the styling look more like the W3C toc.
Btw, one issue with the current W3C doc is that it underlines all links even for single operators in math formulas, which can be really confusing and distracting. Is that still the case with this version? (The newhtml link seems dead, so I can't check.)
https://www.ngzhian.com/spec/newhtml/ is back up. (sorry my rebase messed up the published pages)
Btw, one issue with the current W3C doc is that it underlines all links even for single operators in math formulas, which can be really confusing and distracting. Is that still the case with this version? (The newhtml link seems dead, so I can't check.)
It doesn't :)
@ngzhian, @ericprud, what's the status of this issue, is it still relevant?
Not actively working on this.