Improve how MathJax handles foreign content in MathML
It's not clear exactly what MathJax can do with foreign objects (SVG, HTML), but I see that Jacques Distler's SVG testcase works: https://bugs.webkit.org/show_bug.cgi?id=100626. There are basically two methods to include foreign objects in MathML:
The nonstandard way: using an annotation-xml as a first child of a semantics. This is invalid but some people have used that, because of Gecko's original implementation of semantics:
<math>
<semantics>
<annotation-xml encoding="application/xhtml+xml">
<button>annotation-xml</button>
</annotation-xml>
</semantics>
</math>
The standard way (proposed by HTML5 folks): use some inline elements inside an mtext.
<math><mtext><button>mtext</button></math>
Both methods work correctly in Firefox and Webkit. Opera hides the annotation-xml but displays the second test case correctly.
MathJax seems to display the content of the first test case as normal text (I suspect it is an issue with XHTML vs HTML parsing) and raises an error "Unknown node type: button" for the second one. This happens in the SVG/HTML-CSS output modes, but even when the MathML output mode is used, the final content is not transmitted correctly to native MathML engines and so they can not handle the test cases...
The annotation-xml will work if you provide the namespace, as in
<math>
<semantics>
<annotation-xml encoding="application/xhtml+xml">
<button xmlns="http://www.w3.org/1999/xhtml">annotation-xml</button>
</annotation-xml>
</semantics>
In section 5.3.2.3, it says
When the annotation value is represented in an XML dialect other than MathML, the namespace for the XML markup for the annotation should be identified by means of namespace attributes and/or namespace prefixes on the annotation value.
so it seems that this should be required. I suspect that the issue without this is that the namespace is incorrect when MathJax inserts the elements into the page. The original data is parsed by the XML parser, and then the nodes are cloned into the HTML page, so I suspect they end up being MathML nodes, and don't have the HTML presentation. So you see the text of the button, but not the button itself.
You are right that MathJax doesn't currently support the HTML-in-mtext approach that is allowed in HTML5 (since HTML5 post-dates when the MathJax Element jax was developed, and it was not part of the MathML3 specification).
I think the way to handle this might be to add a new "foreign" node type to the mml Element Jax that could hold unknown nodes so that they could be passed on to the NativeMML output. I guess the HTML-CSS and SVG output could either ignore them, or insert an error message. Then the MathML Input Jax would not throw an error for unknown node types but instead pass them on as the data for one of the "foreign" node. I guess that would also make it possible for MathJax to handle Content MathML nodes without producing errors (they could at least be passed through to NativeMML output.
The annotation-xml will work if you provide the namespace
Ah, I thought I had tried that but apparently didn't. That's what I meant by "XHTML vs HTML", that is HTML5 can "guess" the namespace but not XHTML.
So the SVG/HTML+CSS can also handle foreign content when it has the right namespace. I'm wondering how difficult it would be to implement the HTML5 approach in MathJax, given that it seems to work with semantics. Is there something in the way we handle mtext that makes this difficult?
Is there something in the way we handle mtext that makes this difficult?
My second comment above suggests a way that this could be done. There is one caveat, however, which is that there are a number of places where there is an assumption that the contents of token elements are text and not other elements. Mostly that is in places that do this.data.join("") to get the text, so they should be easy to find. This has always troubled me a little bit, as I think mglyph is supposed to be allowed in token elements, but it is so rarely used, I didn't do anything about it. But if we are to allow HTML elements within mtext, I'll have to track down the places where this occurs and work out what to do in situations where HTML is used.
I'm not sure the namespace will be properly determined for
<math><mtext><button>mtext</button></math>
It does not seem a very good idea to use the XML parser for HTML5 content. I remember that I had to append a xmlns="..." string to the
<!doctype html>
<html>
<head>
<title>testcase</title>
<meta charset="utf-8">
<script type="text/javascript">
function update()
{
var div = document.getElementById("div");
var source = div.innerHTML;
source = source.replace("Click Here", "Thanks!");
source = "<html><head><title></title></head><body>"+source+"</body></html>";
try {
var parser = new DOMParser();
var doc = parser.parseFromString(source, "text/html");
if (doc) {
div.replaceChild(doc.body.firstChild, div.firstChild);
} else {
alert("parseFromString(..., 'text/html') returns null");
}
} catch(err) {
alert(err);
}
}
</script>
</head>
<body>
<div id="div"><math><msqrt><mn>2</mn></msqrt><mo>+</mo><mtext><button onclick="update()">Click Here</button></math></div>
</body>
</html>
Another issue is how we will handled things like MathML embedded in HTML embedded in MathML. I think I saw MathJax replacing the inner math by its script element and I don't think that will currently give the correct result at the end. However, I don't think that is a serious issue.
Something to workaround the lack of support for 'text/html' in DOMParser is suggested here: https://developer.mozilla.org/en-US/docs/Web/API/DOMParser
Basically, create a fake HTML document and set body.innerHTML. Actually, I guess this will work for XML too. Can we do that instead of using the XML parser?
I've created a branch that follows the suggestion of using a foreignNodes type. That seems to work in simple cases (i.e. those that work with annotation-xml).
<p>
<math>
<mi>x</mi>
<mo>+</mo>
<mtext><button xmlns="http://www.w3.org/1999/xhtml">X</button></mtext>
<mn>3</mn>
</math>
</p>
<p>
<math>
<mi>x</mi>
<mo>+</mo>
<semantics>
<annotation-xml encoding="application/xhtml+xml">
<button xmlns="http://www.w3.org/1999/xhtml">X</button>
</annotation-xml>
</semantics>
<mn>3</mn>
</math>
</p>
Other test cases that don't work (the last one cause js errors if you switch the renderer):
<p>
<math>
<mi>x</mi>
<mo>+</mo>
<mtext><button>X</button></mtext>
<mn>3</mn>
</math>
</p>
<p>
<math>
<mi>x</mi>
<mo>+</mo>
<semantics>
<annotation-xml encoding="application/xhtml+xml">
<button>
</annotation-xml>
</semantics>
<mn>3</mn>
</math>
</p>
<p>
<math>
<mi>x</mi>
<mo>+</mo>
<mtext>xxx<button xmlns="http://www.w3.org/1999/xhtml">X</button>yyy</mtext>
<mn>3</mn>
</math>
</p>
<p>
<math>
<mi>x</mi>
<mo>+</mo>
<semantics>
<annotation-xml encoding="application/xhtml+xml">
xxx
<button xmlns="http://www.w3.org/1999/xhtml">X</button>
yyy
</annotation-xml>
</semantics>
<mn>3</mn>
</math>
</p>
<p>
<math>
<mi>x</mi>
<mo>+</mo>
<semantics>
<annotation-xml encoding="application/xhtml+xml">
<span>xxx</span>
<button xmlns="http://www.w3.org/1999/xhtml">X</button>
<span>yyy</span>
</annotation-xml>
</semantics>
<mn>3</mn>
</math>
</p>
<p>
<math>
<msqrt>
<mtext>
<span xmlns="http://www.w3.org/1999/xhtml">
<math xmlns="http://www.w3.org/1998/Math/MathML">
<msqrt><mi>x</mi></msqrt>
</math>
</span>
</mtext>
</msqrt>
</math>
</p>
8 years later, is this still being worked on? This seems like a crucial issue to me...
@jazzpirate:
8 years later, is this still being worked on?
It is in the backlog of things to do, but it is not under active development at the moment. Four of those 8 years were involved in the complete rewrite of MathJax for v3 (which is still missing some important features from v2), so most of the development work was not on new features during that time. It is still something that we want to do, and the v3 architecture should make that easier to do, but it still hasn't been done.
The <semantics> approach does still work in v3 (as in v2), and it can be used to embed HTML into MathJax output, so there is that mechanism that can be used. It may be possible to do a MathML input jax prefilter that would convert the <mtext>...(html)...</mtext> into <semantics><annotation-xml encoding="application/xhtml+xml">...(html)...</annotation-xml></semantics>, if you could identify the proper <mtext> elements to convert. That might be an effective work-around for now.
This seems like a crucial issue to me...
Not everyone's priorities are the same. While this is certainly a useful thing, it is not something our sponsors are clamoring for, and there are other more pressing issues (font support, line breaking, additional TeX packages).
Of course, if a community member wants to contribute code to implement it, we would be happy to entertain that.
Okay, thanks for the update :)
@Jazzpirate, OK, having suggested the pre-filter, I thought I would check it out and see if that would actually work, and it turns out not to be too hard to do. So perhaps this configuration will work for you (in v3):
<script>
MathJax = {
startup: {
ready() {
MathJax.startup.defaultReady();
const MML = MathJax.startup.document.inputJax[0];
const adaptor = MML.adaptor;
MML.mmlFilters.add(function ({math, document, data}) {
for (const mtext of data.querySelectorAll('mtext')) {
const child = mtext.firstElementChild;
if (child && child.namespaceURI === 'http://www.w3.org/1999/xhtml') {
const semantics = adaptor.node('semantics', {}, [
adaptor.node('annotation-xml', {encoding: 'application/xhtml+xml'}, mtext.childNodes)
]);
mtext.parentNode.replaceChild(semantics, mtext);
}
}
});
}
}
}
</script>
<script id="MathJax-script" async src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/mml-chtml.js"></script>
This will look for <mtext> elements and if their first child element is an HTML element, it will replace the <mtext> with a <semantics> element that MathJax can render. This really only works if the <mtext> doesn't start with text, but you could make the testing more spophisticated (like looking through all the children rather than just the first one).
Anyway, perhaps that will fit your needs for now.
Two other use cases of having markup within an mtext tag:
- hyperlinks:
<math>
<mrow>
<msup><mi>a</mi><mn>2</mn></msup>
<mo>+</mo>
<msup><mi>b</mi><mn>2</mn></msup>
<mo>=</mo>
<msup><mi>c</mi><mn>2</mn></msup>
<mo separator="true"> </mo>
<mtext>is the <a href="https://en.wikipedia.org/wiki/Pythagorean_theorem" title="">Pythagorean theorem</a></mtext>
</mrow>
</math>
- struts:
<math>
<mrow>
<mo>(</mo>
<mtext><span style="width:0.0pt;height:30.0pt;background:black;display:inline-block;"></span></mtext>
<mo>)</mo>
</mrow>
</math>
the latter can certainly be accomplished through other tags, but this is still MathML that I would like to be able to use with MathJax.
@Jazzpirate and @teepeemm, there is a new PR that implements HTML in MathML token elements for MathJax v3. See link above.