\perp is mapped to the wrong Unicode code point
Issue Summary
MathJax maps the TeX macro \perp to the Unicode char U+22A5 which Unicode defines as "UP TACK {base, bottom}". It should be mapped to U+27C2 which Unicode defines as "PERPENDICULAR {orthogonal to}". Although they look similar, they are slightly different.
Beside the potential subtle difference in rendering, the bad mapping means that screen readers will say the wrong thing for \perp unless they (wrongly) map U+22A5 to "perpendicular". If they do that, then they won't speak "up tack" or "bottom" when they should.
Steps to Reproduce:
The bad mapping is in the code in several places. If you want to check the output, you can
- open https://mathjax.github.io/MathJax-demos-web/input-tex2chtml.html
- Enter
x \perp y - Right click in the rendered result and choose "Show Math As: MathML Code"
Technical details:
- MathJax Version: in master branch and v4.0-beta
This goes way back to version 1, and is due to the fact that the original MathJax fonts didn't include a glyph for U+27C2, and to the fact that I didn't know Unicode very well at that time, and that MathJax had not yet become involved in assistive technology.
It should probably be changed for v4.
That seems to be the case for many other translators including KaTeX and pandoc. I think everyone started with a similar mapping. I filed bug reports with them also.
Fixed in v4.0.
U+27C2 (the character ⟂ used for \perp in MathJax’s TeX BaseMappings) lies in the Unicode block range 0x27C0–0x27EF, which is covered by:
[0x2500, 0x27ef, TEXCLASS.ORD, 'mo'], // Box Drawing (though) Miscellaneous Math Symbols-A
However, ⟂ is a relation symbol semantically (it should get relation spacing).
MathJax v4 leaves U+27C2 as a plain ORD from the raw range classification. @dpvc MathJax v4 treats U+22A5 as a relation (REL) during TeX parsing.
'\u22A5': MO.REL, // up tack
https://github.com/mathjax/MathJax-src/blob/1a2ef74c0ac0620e7b8de46402c9dce3b95ade52/ts/core/MmlTree/OperatorDictionary.ts#L757
@hbghlyj, Looks like U+27C2 was not in the MathML3 operator dictionary, so didn't get an entry in ts/core/MmlTree/OperatorDictionary.js. I will need to add one.
Here is a temporary startup.ready patch, modeled on #3203 splice example:
window.MathJax = {
startup: {
ready() {
const { RANGES } = MathJax._.core.MmlTree.OperatorDictionary;
const { TEXCLASS } = MathJax._.core.MmlTree.MmlNode;
// Split the covering range and promote a single code point.
function promote(codepoint, texClass, nodeType) {
const i = RANGES.findIndex(([a, b]) => a <= codepoint && codepoint <= b);
if (i < 0) return; // nothing to do if no covering range found
const [a, b, cls, node] = RANGES[i];
const before = (a <= codepoint - 1) ? [[a, codepoint - 1, cls, node]] : [];
const middle = [[codepoint, codepoint, texClass, nodeType]];
const after = (codepoint + 1 <= b) ? [[codepoint + 1, b, cls, node]] : [];
RANGES.splice(i, 1, ...before, ...middle, ...after);
}
// ⟂ U+27C2: treat as a relation operator (for \perp)
promote(0x27C2, TEXCLASS.REL, 'mo');
MathJax.startup.defaultReady();
}
}
};
U+27C2 lives inside a broad range that defaults to ORD/mo. The helper splits that range into three pieces and assigns REL to the singleton [0x27C2, 0x27C2], making ⟂ (U+27C2) a relation (TEXCLASS.REL) mo so it gets relation spacing.
Related: https://github.com/mathjax/MathJax/issues/3441