pandoc
pandoc copied to clipboard
Provide an option to embed only local resources
Explain the problem.
when a markdown file is compiled to HTML using --mathjax
and --embed-resources
, pandoc tries to also embed mathjax fonts, which are obviously not available, leading to a bad output for math. Dropping embed-resources leads to fonts downloaded at viewtime.
MWE-
$$t=\alpha^2 \begin{bmatrix}
3 & 4 \\
4 & 5
\end{bmatrix}$$
$ pandoc test.md --standalone --embed-resources --mathjax -o test.html
Pandoc version?
pandoc 2.19.2
Compiled with pandoc-types 1.22.2.1, texmath 0.12.5.3, skylighting 0.13,
citeproc 0.8.0.1, ipynb 0.2, hslua 2.2.1
Scripting engine: Lua 5.4
So, don't use --embed-resources
with --mathjax
?
Or is there some particular change to pandoc's behavior that you have in mind?
If you're going to have access to the net to download the fonts, then there's little point in using --embed-resources
for other things.
See also commit 63deba49d4f93a4ed1520b9a4b11786e1b8c2eb9 and #682.
There is actually a way to have resources embedded while leaving mathjax as just a link to the CDN.
You'll need to use a default template.
Generate the standard html5 template using pandoc -D html5 > newtemplate.html
.
Replace the part that says
$if(math)$
$math$
$endif$
with
<script data-external="1"
src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-chtml-full.js"
type="text/javascript"></script>
Note the data-external
attribute; s.v. "Linked media" in the manual.
This will cause the resource-embedding to skip this tag.
Now use pandoc with
pandoc --template newtemplate.html --mathjax --embed-resources -s
Yeah I was thinking the same, but erred on the side of opening an issue.
That said, does it make sense to add an option to allow only local links to be embedded? My current usecase was to make an HTML report with some math and a few images, but sharing the HTML as a standalone without the images, since I don't want to set up a server just for this.
I can see the point of that. I'll change the title of your issue.
A simpler solution may be to change the replacement of the math
variable with the data-external="1"
version by default, and let people use --mathjax=url
to force embedding of mathjax. This may not be extendable though.
If we change the default template like so:
$if(math)$
$if(mathjax)$
<script data-external="1" src="$mathjaxurl$" type="text/javascript"></script>
$else$
$math$
$endif$
$endif$
then it would add data-external="1"
automatically, while still allowing users to change this behavior if they want, by modifying the default template.
But I'm not sure this is the best solution. Some people might prefer to have the core mathjax stuff baked in, even if the fonts are not quite right.
Another option is to add an optional attribute to --embed-resources
.
--embed-resources=[local|remote|all]
with default all
.
I've had this same issue with embedded YouTube videos (which I frequently use in class)—the iframe is black if --embed-resources is used.
If I use the data-external="1"
workaround, the YouTube video shows, but then when I save the presentation as .html in Firefox, it no longer produces a single file, but an .html file plus a folder with additional things.
The expected behavior for me would be for --embed-resources to not embed remote resources by default, and still produce a single .html file.
Pandoc has nothing to do with the behavior of "Save" in Firefox.
I'm sorry, I recognize that. Just trying to share more data-points to the discussion, in case Firefox's behavior is revealing in some way.
I think that if the resource is not embedded, then Firefox is not going to produce a single .html file. So there's really no way to get what you're asking for.
Another option is to add an optional attribute to
--embed-resources
.--embed-resources=[local|remote|all]
with defaultall
.
When will this new feature be added?
Still not sure whether a new command-line option is really needed. The mathjax issue could be addressed by the solution in comment https://github.com/jgm/pandoc/issues/8362#issuecomment-1289466907 which still seems good to me.
I support the addition of this new option --embed-resources=[local|remote|all]
. For a defaults file, it would be embed-resources: [none|local|remote|all]
, for backwards compatibility with false
as an alias for none
and true
as an alias for all
.
My use case is similar to something already mentioned: I prepare materials for a class as an HTML file, uploaded on Moodle. This document is meant to be viewed online, therefore loading resources from the net is not a problem. However, additionally uploading local external resources is cumbersome.
Editing the template is a reasonable workaround, but is harder when using Pandoc through Quarto – their template is not a simple file which can be copied and modified, it is dynamically generated (imho unfortunately). I came up with the following hack:
embed-resources: true
html-math-method:
method: mathjax
url: ""
header-includes: <script data-external="1" src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js" type="text/javascript"></script>
Edit: url: ""
seems to (sometimes?) break the HTML output. Better to use data:,
.
Editing the template is a reasonable workaround, but is harder when using Pandoc through Quarto – their template is not a simple file which can be copied and modified, it is dynamically generated (imho unfortunately).
In quarto, we are patching the template to make math avoid self contained already and by default. This is explained in https://quarto.org/docs/output-formats/html-publishing.html#standalone-html
There is a specific option self-contained-math: true
to not change the default pandoc behavior.
So there should be no need to patch the template with quarto.
If this is not working (anymore) this is a regression and a bug that we should fix. Open an issue in quarto if so. Thanks !
@cderv I tried again, and now it works. I can't reconstruct what the problem was that lead me to this Pandoc issue and my elaborate hack.
I still think it would be better to solve this on the Pandoc side with extended options to embed-resources
, instead of patching the template by Quarto. As expressed here, I wish Quarto would do less intransparent things, to make it easier for Pandoc users to adapt it to one's needs.
I ran into this problem again while using Noto Sans & Mono in several weights in a Quarto document, from Google Fonts referenced using an @import
rule in CSS.
Without embedding, the HTML file is created in less than a second and has 29,972 bytes,
with embedding it takes half a minute and the file has 38,923,148 bytes.
I'll try to circumvent this by referencing the webfonts using <link data-external="1" href="…" rel="stylesheet">
instead.
But I still think having an extended syntax --embed-resources=[local|remote|all]
would be very useful. The value of embed-resources
for me is mainly that I can send or upload a document as a single file. Embedding webfonts and other network resources is not necessary and bloats the output.