siunitx
siunitx copied to clipboard
Version 3 generates unexpected output with tex4ht and mathjax
Compiling the file test.tex
:
\documentclass{article}
\usepackage{siunitx}
\begin{document}
The speed is $v = 3$ \si[per-mode=symbol]{\meter\per\second}.
\end{document}
using make4ht test.tex "xhtml,mathjax"
puts a part of the siunitx
source code in the output test.html
<body>
<!-- l. 4 --><p class='noindent'>The speed is \(v = 3\) \(\relax \exp_args:NV \__siunitx_print_math_auxiii:n \l__siunitx_print_tmp_tl \).
</p>
</body>
The document compiles correctly with make4ht
if I use siunitx-v2
instead of version 3.0.24.
I'll take a look. I can see where that line comes from in siunitx
but I'm not sure why it's not being expanded by tex4ht
.
At the cost of loosing font control, one could use
\ExplSyntaxOn
\cs_gset_protected:Npn \__siunitx_print_math_auxii:n #1
{
\tl_set:Nn \l__siunitx_print_tmp_tl {#1}
\exp_args:NNnx \tl_replace_all:Nnn \l__siunitx_print_tmp_tl
{ ^ } { \token_to_str:N ^ }
\exp_args:NV \ensuremath \l__siunitx_print_tmp_tl
}
\ExplSyntaxOff
(after \begin{document}
) for the present.
I think I will need to discuss this with the tex4ht
developers: I can see a bit of what is happening, but I'm not sure if that is the best fix.
In the MathJax mode, all content of math environments is passed unexpanded to the HTML output, and it is left to MathJax to render it. So it is important that content in \( ... \)
contains only macros that MathJax knows.
@michal-h21 I'd worked that out :) I remember now that in v2 I just gave up with units and forced text mode. That's really sub-optimal from a semantic point of view, but my hack above means we give up with font control: also suboptimal. As we are in a typesetting context, I'd not done stuff by expansion, but I probably could arrange to have a 'clean' math mode setup before applying \ensuremath
.
I'll need to ponder this a bit: some of the search-and-replace gets awkward, etc.
Your code works pretty well when I save it as siunitx.4ht
:
\ExplSyntaxOn
\AtBeginDocument{%
\cs_gset_protected:Npn \__siunitx_print_math_auxii:n #1
{
\tl_set:Nn \l__siunitx_print_tmp_tl {#1}
\exp_args:NNnx \tl_replace_all:Nnn \l__siunitx_print_tmp_tl
{ ^ } { \token_to_str:N ^ }
\exp_args:NV \ensuremath \l__siunitx_print_tmp_tl
}
}
\ExplSyntaxOff
\Hinput{siunitx}
\endinput
The only issue is that it still produces the \relax
command to the output HTML, and MathJax doesn't support that. It is possible to pass a dummy configuration for this command to MathJax, but it would be better to omit it from the output anyway.
I've got a few ideas about how to approach this. The best is if there is a way to know we are in MathJax mode: @michal-h21 is there a flag? If so, I could arrange that \SI
, etc., given in text mode 'convert' to math mode, and that will then mean that material is passed to MathJaX as-is. Failing that, I think I can arrange to move where the internal \enusremath
sits such that it's not an issue.
My hack is very much that: it relies on siunitx
internals so could go into siunitx
itself but I'd rather didn't go into any other packages. On the \relax
, that's just from \ensuremath
: one could arrange to avoid that as with e-TeX we don't require the 'defensive' code.
For example the \ifdefined\fixmathjaxtoc
is true only in the MathJax mode. It is definitely best to do any fixes in siunitx.4ht
. Note that it is executed before \AtBeginDocument', it seems that
siunitx` defines some macros at that moment?
Maybe it would be easier to output the detokenized math content and, and \(
\)
strings around it? To avoid any involment of the LaTeX math mode, as it is not useful in this case anyway.
@josephwright, you said
If so, I could arrange that
\SI
, etc., given in text mode 'convert' to math mode, and that will then mean that material is passed to MathJaX as-is
This is exactly what siunitx
does if the \si
command is inside math mode in the TeX
source code:
\begin{document}
The speed is $v = 3 \si[per-mode=symbol]{\meter\per\second}$.
\end{document}
gives
<body>
<!-- l. 4 --><p class='noindent'>The speed is \(v = 3 \si [per-mode=symbol]{\meter \per \second }\).
</p>
</body>
That does not help, though, because as of now, MathJaX
does not support \si
without external packages.
I think leaving it up to MathJaX
would be ideal, especially if the external package could be integrated with MathJaX
. It is beyond my coding abilities to do that myself, though!
I just filed this bug report because I was using \si
in text mode to generate my site and it stopped working when I updated siunitx
to version 3.
@rlkamalapurkar you can configure MathJax in TeX4ht, so you can try to integrate it with the MathJax Siunitx package.
@michal-h21 As my hack touches an internal function, it's really not suitable for anything outside of siunitx
itself. I'm pondering if you need an API here or whether I can fix nicely at my end.
(I guess @frankmittelbach might have wider comments on the entire business of patches for tex4ht
)
@josephwright I've come with this solution:
\ExplSyntaxOn
\ifdefined\fixmathjaxtoc
\AtBeginDocument{%
\cs_gset_protected:Npn \__siunitx_print_math_auxii:n #1
{
\tl_set:Nn \l__siunitx_print_tmp_tl {#1}
\exp_args:NNnx \tl_replace_all:Nnn \l__siunitx_print_tmp_tl
{ ^ } { \token_to_str:N ^ }
% escape special HTML characters
\regex_replace_all:nnN { \x{26} } { & } \l__siunitx_print_tmp_tl
\regex_replace_all:nnN { \x{3C} } { < } \l__siunitx_print_tmp_tl
\regex_replace_all:nnN { \x{3E} } { > } \l__siunitx_print_tmp_tl
\HCode{\detokenize{\(} \tl_to_str:N \l__siunitx_print_tmp_tl \detokenize{\)}}
}
}
\fi
\ExplSyntaxOff
\Hinput{siunitx}
\endinput
It reuses some code that TeX4ht uses in the MathJax mode to replace <
, >
and &
, as these characters would cause invalid HTML. I am not sure about the \HCode
line, it is not in the Expl 3 style, but it does the trick :)
I can put this to siunitx.4ht
and add it to the TeX4ht sources.
I've updated the code to do a better job here, but I still need to think about how best to expose 'extra search and replace' to tex4ht. I'm still wondering a bit about this: I guess I expected it to be handled 'last minute'.
@josephwright I've already put the code from my previous post to TeX4ht sources. Will it need a modification for the new Siunitx code?
@michal-h21 I've still only got an internal interface, so you'll want something like
\tl_if_exist:NTF \l__siunitx_print_math_html_tl
{
\tl_put_right:Nn \l__siunitx_print_math_html_tl
{
& { & }
< { < }
> { > }
}
}
{
\cs_gset_protected:Npn \__siunitx_print_math_auxii:n #1
{
\tl_set:Nn \l__siunitx_print_tmp_tl {#1}
\exp_args:NNnx \tl_replace_all:Nnn \l__siunitx_print_tmp_tl
{ ^ } { \token_to_str:N ^ }
% escape special HTML characters
\regex_replace_all:nnN { \x{26} } { & } \l__siunitx_print_tmp_tl
\regex_replace_all:nnN { \x{3C} } { < } \l__siunitx_print_tmp_tl
\regex_replace_all:nnN { \x{3E} } { > } \l__siunitx_print_tmp_tl
\HCode{\detokenize{\(} \tl_to_str:N \l__siunitx_print_tmp_tl \detokenize{\)}}
}
}
I'm still trying to work out a proper, public, interface. What's confusing me is I don't follow why you need to filter out &
, <
and >
at the siunitx
end, as they must show up in general math mode material anyway - don't you make them math-active?
Thanks. We need to escape &
, <
and >
because they would end directly in the HTML code otherwise. As they are special HTML characters, it would result in rendering errors.
What does this code do?:
\tl_put_right:Nn \l__siunitx_print_math_html_tl
{
& { & }
< { < }
> { > }
}
Actually, you might need
\tl_put_right:Nx \l__siunitx_print_math_html_tl
{
& { \token_to_str:N & amp ; }
< { \token_to_str:N & lt ; }
> { \token_to_str:N & gt ; }
}
What this does is add to the internal token list (macro) \l__siunitx_print_math_html_tl
, which is then used in a search-and-replace of the tokens to be passed to math mode.
The reason I've not provided this as a public interface is that I'd imagine you need to handle a simple
$ a < b $
and the 'obvious' way to me is something like
\mathcode`\<="8000\relax
\begingroup
\catcode`\<=\active
\xdef<{\string>}
\endgroup
which would then apply to the output from siunitx
without any special handling. That won't work for ^
from siunitx
as I explicitly set it as catcode-7, which is why I have to 'tidy up' that one case.
Hi there,
siunitx produces unexpected output if \num{} is used in text.tex above (only inside a math environment):
The speed is $v = \num{3.14159}$ \si[per-mode=symbol]{\meter\per\second}.
it gives
<!-- l. 4 --><p class='noindent'>The speed is \(v = \num {3.14159}\) \(\mathrm {m}/\mathrm {s}\).
\num is unknown to the browser and shows up in red. Could this be solved - or is there another solution ?
Thanks for your excellent work !
You need to configure MathJax to support Siunitx. See this guide on how to pass MathJax configuration from TeX4ht. There seems to be MathJax extension for Siunitx, but with a deprecation warning, so I am not sure how well it works.
In your case the following configuration file can be used to support just the \num
command:
\Preamble{xhtml}
\catcode`\#=11
\Configure{MathJaxConfig}{{
tex: {
tags: "ams",
\detokenize{%
macros: {
num: ["#1",1],
}
}
}
}}
\catcode`\#=6
\begin{document}
\EndPreamble
@michal-h21 : the configuration indeed solves the issue with \num{} inside a math environment - thank you very much!
However, siunits still do not display correctly within any math environment, and unfortunately I could not figure out how to modify the configuration so that it would work. The MathJax extension is deprecated - it would appear to partially resolve \si{} but not \unit{}.
Would you have a suggestion to make the following text display correctly in html ? This would probably also be helpful for others :-).
test.tex:
The values are \begin{equation}v = \num{3.14159}\,\si{\meter\per\second}\end{equation} and \[ \lambda = \num{2,71828}\,\unit{\centi\meter} \]
it gives:
<p class='noindent'>The values are \begin {equation} v = \num {3.14159}\,\si {\meter \per \second } \end {equation}<a id='x1-2r1'></a> and \[ \lambda = \num {2,71828}\,\unit {\centi \meter } \]
</p>
"\si \meter \per \second" and "\unit \centi \meter" is still displayed in red by the browser ...
I guess the best solution would be if someone upgraded the MathJax extension. Otherwise, you would need to pass suitable definition for all Siunitx commands, which would be complicated. You can also try the MathML + MathJax combination. You would avoid the issues with unknown macros, but it is possible that you would run into other issues. You can try it using
$ make4ht test.tex "mathml,mathjax"
I guess the best solution would be if someone upgraded the MathJax extension.
I'm working on a javascript port of this tool that will work in the latest version of MathJax. It's not public yet, but it s coming very soon and will be open/free/etc. Maybe next month. Hopefully, before the end of the year.
Looks to me like this is fixed from the TeX4ht end, so I will close here.