MathJax icon indicating copy to clipboard operation
MathJax copied to clipboard

\perp is mapped to the wrong Unicode code point

Open NSoiffer opened this issue 1 year ago • 2 comments

Issue Summary

MathJax maps the TeX macro \perp to the Unicode char U+22A5 which Unicode defines as "UP TACK {base, bottom}". It should be mapped to U+27C2 which Unicode defines as "PERPENDICULAR {orthogonal to}". Although they look similar, they are slightly different.

Beside the potential subtle difference in rendering, the bad mapping means that screen readers will say the wrong thing for \perp unless they (wrongly) map U+22A5 to "perpendicular". If they do that, then they won't speak "up tack" or "bottom" when they should.

Steps to Reproduce:

The bad mapping is in the code in several places. If you want to check the output, you can

  1. open https://mathjax.github.io/MathJax-demos-web/input-tex2chtml.html
  2. Enter x \perp y
  3. Right click in the rendered result and choose "Show Math As: MathML Code"

Technical details:

  • MathJax Version: in master branch and v4.0-beta

NSoiffer avatar Jan 05 '25 20:01 NSoiffer

This goes way back to version 1, and is due to the fact that the original MathJax fonts didn't include a glyph for U+27C2, and to the fact that I didn't know Unicode very well at that time, and that MathJax had not yet become involved in assistive technology.

It should probably be changed for v4.

dpvc avatar Jan 15 '25 19:01 dpvc

That seems to be the case for many other translators including KaTeX and pandoc. I think everyone started with a similar mapping. I filed bug reports with them also.

NSoiffer avatar Jan 15 '25 20:01 NSoiffer

Fixed in v4.0.

dpvc avatar Aug 13 '25 14:08 dpvc

U+27C2 (the character ⟂ used for \perp in MathJax’s TeX BaseMappings) lies in the Unicode block range 0x27C0–0x27EF, which is covered by:

[0x2500, 0x27ef, TEXCLASS.ORD, 'mo'], // Box Drawing (though) Miscellaneous Math Symbols-A

However, ⟂ is a relation symbol semantically (it should get relation spacing).

hbghlyj avatar Aug 29 '25 09:08 hbghlyj

MathJax v4 leaves U+27C2 as a plain ORD from the raw range classification. @dpvc MathJax v4 treats U+22A5 as a relation (REL) during TeX parsing.

    '\u22A5': MO.REL,        // up tack

https://github.com/mathjax/MathJax-src/blob/1a2ef74c0ac0620e7b8de46402c9dce3b95ade52/ts/core/MmlTree/OperatorDictionary.ts#L757

hbghlyj avatar Aug 29 '25 09:08 hbghlyj

@hbghlyj, Looks like U+27C2 was not in the MathML3 operator dictionary, so didn't get an entry in ts/core/MmlTree/OperatorDictionary.js. I will need to add one.

dpvc avatar Aug 29 '25 12:08 dpvc

Here is a temporary startup.ready patch, modeled on #3203 splice example:

window.MathJax = {
  startup: {
    ready() {
      const { RANGES }   = MathJax._.core.MmlTree.OperatorDictionary;
      const { TEXCLASS } = MathJax._.core.MmlTree.MmlNode;

      // Split the covering range and promote a single code point.
      function promote(codepoint, texClass, nodeType) {
        const i = RANGES.findIndex(([a, b]) => a <= codepoint && codepoint <= b);
        if (i < 0) return;  // nothing to do if no covering range found
        const [a, b, cls, node] = RANGES[i];

        const before = (a <= codepoint - 1) ? [[a, codepoint - 1, cls, node]] : [];
        const middle = [[codepoint, codepoint, texClass, nodeType]];
        const after  = (codepoint + 1 <= b) ? [[codepoint + 1, b, cls, node]] : [];

        RANGES.splice(i, 1, ...before, ...middle, ...after);
      }

      // ⟂ U+27C2: treat as a relation operator (for \perp)
      promote(0x27C2, TEXCLASS.REL, 'mo');

      MathJax.startup.defaultReady();
    }
  }
};

U+27C2 lives inside a broad range that defaults to ORD/mo. The helper splits that range into three pieces and assigns REL to the singleton [0x27C2, 0x27C2], making (U+27C2) a relation (TEXCLASS.REL) mo so it gets relation spacing.

hbghlyj avatar Sep 02 '25 06:09 hbghlyj

Related: https://github.com/mathjax/MathJax/issues/3441

hbghlyj avatar Sep 27 '25 07:09 hbghlyj