HTML parsing does not preserve MathML+SVG namespaces
When using an HTML parser with p:unescape-markup to load some content in the HTML syntax, MathML and SVG islands should be properly namespaced.
A sample test is:
<?xml version="1.0" encoding="UTF-8"?>
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" xmlns:c="http://www.w3.org/ns/xproc-step"
version="1.0" exclude-inline-prefixes="#all">
<p:input port="source">
<p:inline>
<doc><![CDATA[
<html lang="en">
<head>
<meta charset="utf-8">
<title>title</title>
</head>
<body>
<h1>HTML + MathML sample</h1>
<p>
<math>
<mi>x</mi>
<mo>=</mo>
<mi>y</mi>
</math>
</p>
</body>
</html>
]]></doc>
</p:inline>
</p:input>
<p:output port="result"/>
<p:unescape-markup content-type="text/html"/>
<p:unwrap match="/*"/>
</p:declare-step>
htmlparser.nu does somehow correctly report the islands in their respective namespaces, since the provided "HTML2XML" test tool does output valid XHTML.
However, it seems it doesn't produce the startPrefixMapping SAX elements, which prevent Saxon to properly declare the namespaces. See issue 820 on htmlparser and also this saxon-help thread.
Calabash (Saxon?) consequently loses the namespaces. Based on local experiments, it seems using a NamespaceReducer doesn't help. I couldn't find an easy way to fix the issue, but it would be helpful to work around HTMLParser+Saxon's limitations.
bugzilla.validator.nu seems to be down, so I can't see issue 820.
By my reading of the saxon-help thread, it's a bug in Henri's code. I don't think I can fix it, but I'll keep my eyes on issue 820 when the site is back.