MathML operators get no spacing if they are not in the operator dictionary
Issue Summary
Try the following with MathJax 3:
<math>
<mrow>
<mi>a</mi>
<mo>𝐸</mo>
<mi>b</mi>
</mrow>
</math>
$a \mathrel{E} b$
The first a E b is rendered with no spacing, unlike the second one.
I'd expect them to be rendered identically, because the default MathML operator spacing prescribes 0.27777em spacing on both sides. (Note: this is assuming I am reading the operator dictionary correctly! However, the native Chrome MathML renderer seems to agree with me.)
Technical details:
- MathJax Version: 3.2.2
- Client OS: Windows 11
- Browser: Chrome 129.0.6668.59
I am replicating this on https://www.mathjax.org/#demo
Supporting information:
Screenshot
There are a couple of things going on, here. First, MathJax doesn't use MathML spacing rules by default. Rather, it tries to use TeX spacing rules unless you tell it not to. So to get the spacing you want, you would need to set the mathmlSpacing option to true in the chtml and svg blocks of your MathJax configuration.
But it turns out that that doesn't fix the problem entirely, as MathJax does some extra work to try to decide the spacing for values that aren't in the operator dictionary (in order to get the TeX spacing right). Unfortunately, this doesn't work well when mathmlSpacing is in effect. I will make a PR to fix the problem in v4, but for now, you can use the following configuration to work around the issue:
MathJax = {
chtml: {
mathmlSpacing: true
},
svg: {
mathmlSpacing: true
},
startup: {
ready: function() {
const SMALLSIZE = 2/18;
const MOSPACE = 5/18;
function MathMLSpace(script, nodict, size) {
return (nodict ? MOSPACE : script ? size < SMALLSIZE ? 0 : SMALLSIZE : size);
}
const {CommonWrapper} = MathJax._.output.common.Wrapper;
const {OPTABLE} = MathJax._.core.MmlTree.OperatorDictionary;
CommonWrapper.prototype.getMathMLSpacing = function () {
const node = this.node.coreMO();
const child = node.coreParent();
const parent = child.parent;
if (!parent || !parent.isKind('mrow') || parent.childNodes.length === 1) return;
const mo = node.getText();
const noDictDef = OPTABLE.infix[mo] || OPTABLE.prefix[mo] || OPTABLE.postfix[mo];
const attributes = node.attributes;
const isScript = (attributes.get('scriptlevel') > 0);
this.bbox.L = (attributes.isSet('lspace') ?
Math.max(0, this.length2em(attributes.get('lspace'))) :
MathMLSpace(isScript, noDictDef, node.lspace));
this.bbox.R = (attributes.isSet('rspace') ?
Math.max(0, this.length2em(attributes.get('rspace'))) :
MathMLSpace(isScript, noDictDef, node.rspace));
const n = parent.childIndex(child);
if (n === 0) return;
const prev = parent.childNodes[n - 1];
if (!prev.isEmbellished) return;
const bbox = this.jax.nodeMap.get(prev).getBBox();
if (bbox.R) {
this.bbox.L = Math.max(0, this.bbox.L - bbox.R);
}
}
MathJax.startup.defaultReady();
}
}
};
I know it is a bit long, but there was not a good way to make the needed changes without override the complete getMathMLSpacing() function.
See if that works for you for now.
Awesome, thanks! Yes, that should solve the problem. I'll double check that it matches the MathML Core spacing rules for all cases.
Ah, finally, I have understood the issue I was reporting. My description was wrong, although as it happens, that was also an issue.
The problem is in checkOperatorTable. When calling getRange, it will use the TeX class even for items classed as identifier (mi) or numbers (mn). I think you should only use ranges that are of type mo.
Therefore, accompanying fix for TeX spacing (the change is the new range[3] == 'mo'):
const MmlMo = MathJax._.core.MmlTree.MmlNodes.mo.MmlMo;
const OperatorDictionary = MathJax._.core.MmlTree.OperatorDictionary;
MmlMo.prototype.checkOperatorTable = function (mo) {
let [form1, form2, form3] = this.handleExplicitForm(this.getForms());
this.attributes.setInherited('form', form1);
let OPTABLE = this.constructor.OPTABLE;
let def = OPTABLE[form1][mo] || OPTABLE[form2][mo] || OPTABLE[form3][mo];
if (def) {
if (this.getProperty('texClass') === undefined) {
this.texClass = def[2];
}
for (const name of Object.keys(def[3] || {})) {
this.attributes.setInherited(name, def[3][name]);
}
this.lspace = (def[0] + 1) / 18;
this.rspace = (def[1] + 1) / 18;
} else {
let range = OperatorDictionary.getRange(mo);
if (range && range[3] == 'mo') {
if (this.getProperty('texClass') === undefined) {
this.texClass = range[2];
}
const spacing = this.constructor.MMLSPACING[range[2]];
this.lspace = (spacing[0] + 1) / 18;
this.rspace = (spacing[1] + 1) / 18;
}
}
}
Thanks for finding the issue. This function was written before the range data included a MathML node type (initially everything was put into an mo), and when that feature was added, I didn't catch this situation. I will make a PR to include your change.
Actually, on looking at this more closely, the checkOperatorTable() function is only called by mo elements, so the range[3] value is irrelevant here (it is only used in TeX processing to determine what type of MathML elements to generate), and when the mo content is not in the operator dictionary, it should get the default spacing for mo elements, which is what the original code does. So now I don't think your change should be made. Unless I'm missing something important.
it should get the default spacing for
moelements, which is what the original code does
All the entries with type 'mi', 'mn' have class ORD, so they get no spacing at all. I thought this was an unintended side effect of the operator table having two jobs (one for TeX input, one for MathML input). My interpretation is that when the MathML source uses 'mo', then MathJax should ignore the entries that would be essentially converted to identifiers, on the assumption that the author would have used mi/mn if that was the intention.
The problem with my interpretation is that I clearly want MathML spacing instead (I am working on it!). Maybe I should first ask what are the intended uses cases of 'MathML input with TeX spacing' before claiming that checkOperatorTable should be modified.
(In the meanwhile, I shipped the range[3] change in BookML as a cheap fix for the mismatch between the LaTeXML output and MathJax's rendering. I'll move to proper MathML core spacing in due course.)
Maybe I should first ask what are the intended uses cases of 'MathML input with TeX spacing' before claiming that
checkOperatorTableshould be modified.
Well, the main reason for it is so that the MathML generated by the TeX input jax can be read into MathJax's MathML input jax and produce the same result as the original TeX did. The secondary reason is so that MathML can be rendered as closely to TeX output as is reasonable; my own feeling is that MathML has a number of places where the rendering produces poor results, and that working around those results by having the TeX input produce the needed MathML would require a semantic understanding of the mathematics that is not explicit in the original TeX, and so subject to mistakes in too many cases.
Of course, the <mo>&#x𝐸</mo> would not be produced by the TeX input jax (other than through something like \mmlToken{mo}{𝐸}), so what
<math>
<mrow>
<mi>a</mi>
<mo>𝐸</mo>
<mi>b</mi>
</mrow>
</math>
should produce in TeX spacing is perhaps not clear. My thought is that it should produce what a 𝐸 b would produce using the TeX input jax, which is without spaces. Of course, with MathML spacing, you should get the default mo spacing, and that doesn't occur in v3, as you point out. In v4, it does, however, so that, at least, is fixed in v4.
the main reason for it is so that the MathML generated by the TeX input jax can be read into MathJax's MathML input jax and produce the same result as the original TeX did
That's a sensible rule, I understand now. It also confirms that my approach with range[3] == 'mo' is a reasonable temporary workaround and should not cause issues in general usage. (For context: LaTeXML targets MathML Core, modulo some implementation bugs, and it creates its own 'TeX spacing' by tweaking the lspace/rspace attributes based on the core operator dictionary. This is partly broken when the output is rendered by MathJax, hence the issues I have been reporting here.)
Follow up question: why is MathML spacing an option of the output rather than the input? I understand it was an input option in MathJax 2.
why is MathML spacing an option of the output rather than the input?
Because it is controlling how the output jax lays out the mathematics from the internal representation, not how that internal representation is produced. The old v2 approach was problematic (though I don't now recall the details), and targeted the wrong place in MathJax's typesetting pipeline.
LaTeXML targets MathML Core, ... and it creates its own 'TeX spacing' by tweaking the lspace/rspace attributes based on the core operator dictionary
As I'm sure you know, the TeX and MathML spacing models work quite differently. TeX determines spacing based on two adjacent tokens and their TeX classes, so spacing is based on pairs of tokens, while MathML adds spacing around mo elements, so is based only on a single element and its position as prefix, infix, or postfix. (Some browsers added spacing elsewhere that was not part of the specification, and it may be that MathML-Core has formalizes that -- I admit that I haven't looked that closely at MathML-Core.) While TeX would add space in \sin x, MathML would not when this is encoded as <mi>sin</mi><mi>x</mi>. Adding <mo>⁡</mo> (function application) in between provides one way to resolve that, but in general, one would have to add empty <mo> or <mspace> elements to provide the needed spacing. That can have effects on line breaking or produce unexpected embellished operators, depending on what surrounds the extra <mo>.
Because MathJax controls the output rendering (while you don't, when you use native Browser MathML-Core support), MathJax could produce uncomplicated MathML (without extra tweaking) that is perhaps more semantic, without having to worry about fixing up rendering issues in the browser (or differences between browsers). MathJax's old v2 NativeMML output jax used to do that, and it was filled with terrible hacks that were in constant need of updates as browsers changed.
TeX determines spacing based on two adjacent tokens and their TeX classes, so spacing is based on pairs of tokens, while MathML adds spacing around
moelements, so is based only on a single element and its position as prefix, infix, or postfix.
LaTeXML is in the middle: after interpreting TeX, it runs the output into a grammar, it reassigns roles of some tokens (e.g. to treat vertical bars as fences, or to recognise brakets), and finally it adds lspace/rspace annotations to the MathML output to mimic the TeX spacing, on the assumption that the renderer follows MathML Core (i.e. it omits the attributes when redundant). It will also add the invisible operators.
Still, LaTeXML definitely suffers from inconsistent browser behaviour, mostly from Firefox, and some of its output is not part of MathML Core (e.g. menclose).
Fixed in v4.0.