rootstock
rootstock copied to clipboard
Render algorithm pseudocode
I am trying to insert pseudocode into a manubot manuscript. As per this example, it should be fairly straightforward. Unfortunately, I can't seem to be able to render it as in the example.
I added the pseudocode from the example in the above link and I added algorithm2e.sty to the content directory. Then I tried a number of things, none of which made the pseudocode render:
- Added the following two lines to
content/metadata.yml
header-includes:
- \usepackage[ruled,vlined,linesnumbered]{algorithm2e}
- Surrounded the pseudocode with
$$on either side. - Surrounded the
metadata.ymlusepackagestatement with$$ - Created a new file called
header.txtwhich contained only theusepackagestatement from above and I added a--header-includesoption to link to this file inbuild/build.sh. - Moved
algorithm2e.styfromcontentto thebuilddirectory.
None of those worked. When I run sh build/build.sh, I get the following issues:
Exporting HTML manuscript
[INFO] Could not load include file 'algorithm2e.sty' at line 1 column 53
[INFO] Could not load include file 'algorithm2e.sty' at line 1 column 53
[INFO] Not rendering RawBlock (Format "tex") "\\usepackage[ruled,vlined,linesnumbered]{algorithm2e}"
[INFO] Not rendering RawBlock (Format "tex") "\\begin{algorithm}[H]\n\\DontPrintSemicolon\n\\SetAlgoLined\n\\KwResult{Write here the result}\n\\SetKwInOut{Input}{Input}\\SetKwInOut{Output}{Output}\n\\Input{Write here the input}\n\\Output{Write here the output}\n\\BlankLine\n\\While{While condition}{\n instructions\\;\n \\eIf{condition}{\n instructions1\\;\n instructions2\\;\n }{\n instructions3\\;\n }\n}\n\\caption{While loop with If/Else condition}\n\\end{algorithm}"
Exporting PDF manuscript
I should note, my machine has MiKTeX installed and I have installed the algorithm2e package (in addition to putting the file itself in the local directories).
Is there a general method or recommended best practice for using LaTeX packages in Manubot?
I can look into this a little further later in the week, but here are two quick comments.
-
What happens if you force
pandocto write out intermediary LaTeX and then ~render~ compile that withpdflatexor your engine of choice? You can do this simply by adding--to=latexand-s --output=output/manuscript.tex \to thepandoccommand in the build script. -
What happens if you force
pandocto explicitly call an external LaTeX engine, for example, by adding something like--pdf-engine=xelatexin the build script? You might want to use another engine, that's just an example that I had handy.
At this point, it's unclear to me whether this is a path issue or something else.
As a quick follow up, I think one issue may be that pandoc compiles in a temporary directory. Thus, if system latex doesn’t know about the package, it might not pull in the package during compilation. So one idea is to use the Tex package manager to install the package (instead of just placing it in the same directory). Alternatively, you may be able to use some combination of the standalone pandoc option to get it to copy the .sty file to the right place.
@zietzm the default manubot build process does not use LaTeX at all. I believe the StackExchange answer assumes that Pandoc's PDF export is done via an intermediate LaTeX step. Therefore, during this step, LaTeX packages can be used.
It would be possible to change the build script to use LaTeX as @slochower points to. However, don't think there is anyway to create an HTML output with formatted algorithm automatically. This stackexchange dialog recommends exporting the algorithm to an image and then using that in the manuscript.
The rendering of LaTeX-style math by manubot is done using MathJax, a javascript library and not a LaTeX package. For a manuscript where you'd like an HTML output, I'd recommend looking for javascript packages that can provide the algorithmic formatting. Since the default workflow is to export a PDF from the HTML, this would get you both a PDF and HTML version.
I'd recommend looking for javascript packages that can provide the algorithmic formatting
So I have been able to render pseudocode in the HTML manuscript using pseudocode.js. Oddly, though, it is not rendered in the PDF.
Using an example from the package's website, it renders HTML just fine (here's a screenshot):

But the PDF has nothing there, nor any sort of warning in the build. I added
--to=latex \
-s --output=output/manuscript.tex \
to the top of the PDF build section, just below --from=markdown \. That also didn't seem to do anything. Nothing renders in the PDF. Should something in the build script be changed?
So I have been able to render pseudocode in the HTML manuscript using pseudocode.js. Oddly, though, it is not rendered in the PDF.
I have no personal experience with this, but this Issue may be relevant for your case: https://github.com/Kozea/WeasyPrint/issues/454
Adding *.tex to the output won't change the PDF, but will give you a chance to compile from TeX to PDF yourself. If there are problems compiling at that step (e.g., pdflatex can't find the style file) it is easier to debug.
So I have been able to render pseudocode in the HTML manuscript using pseudocode.js.
Great! The HTML output is the priority. Can you link or post the markdown source that you used to create the algorithm? I am curious.
The PDF output is created from the HTML using WeasyPrint. As @slochower mentions, it seems that WeasyPrint may be incapable of converting it because it's generated using javascript. This seems like a major limitation and it would be interesting to see if other PDF converters succeed, as we consider whether to switch as per https://github.com/greenelab/manubot-rootstock/issues/170.
If you use your web browser to print to PDF from the HTML, does the algorithm render?
Adding --to=latex with pseudocode.js doesn't make sense, as Javascript only applies to HTML output and not latex. My advice would be to proceed with pseudocode.js and hopefully we will find a workaround to broken PDF export.
-
I downloaded pseudocode.js (link) and unpacked the archive into
build/assets/ -
I added these four lines to
build/assets/analytics.html(just a simple way to get them into the head without needing to create a new file). They were (almost) directly from the pseudocode.js sample
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.9.0-beta/katex.min.css" integrity="sha384-L/SNYu0HM7XECWBeshTGLluQO9uVI1tvkCtunuoUbCHHoTH76cDyXty69Bb9I0qZ" crossorigin="anonymous">
<script src="https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.9.0-beta/katex.min.js" integrity="sha384-ad+n9lzhJjYgO67lARKETJH6WuQVDDlRfj81AJJSswMyMkXTD49wBj5EP004WOY6" crossorigin="anonymous"></script>
<link rel="stylesheet" href="../build/assets/pseudocode.js-1.1/pseudocode.css" type="text/css">
<script src="../build/assets/pseudocode.js-1.1/pseudocode.js" type="text/javascript"></script>
- Then directly into
02.main.mdfile I added one of the algorithms in the sample file plus some javascript to render it
<pre id="test-basics" style="display:none">
\begin{algorithm}
\caption{Test atoms}
\begin{algorithmic}
\STATE \textbf{Specials:} \{ \} \$ \& \# \% \_
\STATE \textbf{Bools:} \AND \OR \NOT \TRUE \FALSE
\STATE \textbf{Carriage return:} first line \\ second line
\STATE \textbf{Text-symbols:} \textbackslash
\STATE \textbf{Quote-symbols:} `single quotes', ``double quotes''
\STATE \textbf{Math:} $(\mathcal{C}_m)$, $i \gets i + 1$, $E=mc^2$, \( x^n + y^n = z^n \), $\$$, \(\$\)
\END{ALGORITHMIC}
\END{ALGORITHM}
</pre>
<script type="text/javascript">
var testBasics = document.getElementById("test-basics").textContent;
pseudocode.render(testBasics, document.body, {
lineNumber: false,
});
</script>
That's all that was required to generate the algorithm in HTML.
If you use your web browser to print to PDF from the HTML, does the algorithm render?
Yes, it renders properly. See below

These relative links will probably break on gh-pages, which doesn't have the same directory structure:
<link rel="stylesheet" href="../build/assets/pseudocode.js-1.1/pseudocode.css" type="text/css">
<script src="../build/assets/pseudocode.js-1.1/pseudocode.js" type="text/javascript"></script>
However, you can use --include-after-body to embed this JS/CSS directly in the HTML document. You probably have to wrap the JS/CSS files in <script> / <style> as done here. In other words, you should insert HTML into the HTML output, such that we don't have to use these relative links that will break.
It would be nice if pseudocode.js were available from a CDN so we could hotlink it. But I didn't see those links anywhere.
Great to see that the PDF creation is possible with the right engine.
I'll see if I can get this included as a plugin, either hot-linked, embedded, or hosted on our own manubot-resources repo. As far as printing it though, yeah, WeasyPrint almost surely doesn't support Javascript if it doesn't support a basic CSS thing like calc(). We'll have to switch our printing tool/service to fix this (and many other issues like it).
I am not very knowledgeable about this sort of thing, but it would be interesting to see if we could use plastex (or something similar) as a generic way to render LaTeX as HTML, specifically because it appears to support LaTeX packages that are given in an .sty file. See example here.
@vincerubinetti and I were discussing LaTeX packages, and how it would be a really awesome addition to the manubot if it could support LaTeX packages in general, without necessarily needing specialized wrappers/parsers for every individual package a user may wish to include.
@vincerubinetti and I were discussing LaTeX packages, and how it would be a really awesome addition to the manubot if it could support LaTeX packages in general, without necessarily needing specialized wrappers/parsers for every individual package a user may wish to include.
What does that mean? Does that mean Manubot → tex output and something like plastex → HTML?
@slochower We're not sure what the process would look like yet; it is something we want to look into.
My vision of it in an ideal world is this: You enter your standard LaTeX in your markdown, perhaps with a flag for which package the LaTeX needs to use. Pandoc essentially keeps that syntax when translating the document to HTML (as it seems to do with the --mathjax flag now). When the HTML page is loaded, a javascript library (as a plugin) scans through the document for any and all LaTeX syntax, and renders it properly. Basically, what MathJax does for math, except for all (or at least a few of the most common) LaTeX packages/variants/etc.
But my limited experience and understanding of LaTeX might mean that that is impossible or undesirable.
@zietzm While I research this further, it would be helpful if you could provide me with some sample LaTeX code for computer science, chemistry, physics, or any other broad areas of study that commonly use LaTeX.
@vincerubinetti Do you mean source code for some commonly-used packages themselves or code for using the packages?
Some common packages (off the top of my head) are
- TikZ (source, usage examples)
- algorithmic and algorithm2e and listings (source, usage examples)
- chemfig (source, usage examples)
- physics (source, usage examples) - Really an extension of math notation, but I don't believe MathJax could render some of this stuff.
All of these could theoretically be replaced by SVGs, though in that case it would be important that text be select-able in the HTML and PDF manuscripts.
I was talking about some sample LaTeX I can actually test these javascript libraries with to see if they render properly. But those packages are also useful.
Sure, here's the full file necessary to produce a little TikZ picture. Probably only the tikzpicture part is necessary.
\documentclass{article}
\usepackage[utf8]{inputenc}
\usepackage{tikz}
\def\layersep{2.5cm}
\begin{document}
\begin{tikzpicture}[shorten >=1pt,-,draw=black!50, node distance=\layersep]
\tikzstyle{every pin edge}=[<-,shorten <=1pt]
\tikzstyle{node}=[circle,fill=blue!50,minimum size=17pt,inner sep=0pt]
\node[node] (0) at (0, 0) {0};
\node[node] (1) at (2, 1) {1};
\node[node] (2) at (2, -1) {2};
\node[node] (3) at (4, 0) {3};
\node[node] (4) at (6, 0) {4};
% Connect nodes along edges
\path (0) edge (1);
\path (0) edge (2);
\path (0) edge (3);
\path (1) edge (2);
\path (2) edge (3);
\path (3) edge (4);
\end{tikzpicture}
\end{document}
It should look like this:

My vision of it in an ideal world is this: You enter your standard LaTeX in your markdown, perhaps with a flag for which package the LaTeX needs to use. Pandoc essentially keeps that syntax when translating the document to HTML (as it seems to do with the
--mathjaxflag now). When the HTML page is loaded, a javascript library (as a plugin) scans through the document for any and all LaTeX syntax, and renders it properly. Basically, what MathJax does for math, except for all (or at least a few of the most common) LaTeX packages/variants/etc.
But my limited experience and understanding of LaTeX might mean that that is impossible or undesirable.
I would love to see this happen, but LaTeX is a nearly (?) Turing-complete language with a ton of complexity. I may be misunderstanding something, but there's going to be no easy way to convert arbitrary LaTeX to HTML without using one of the existing specialized tools which have their own limitations. I think what you're talking about is creating something like MathJax but for any arbitrary LaTeX command, right? And look how complicated MathJax is and how little of the full domain of LaTeX math commands are covered in MathJax. I could be totally missing your goal here, though.
I wouldn't dream of trying to create something like MathJax myself. The idea was to use an already existing javascript library, if there is one, that renders generic LaTeX. Or, if a single one doesn't exist, including a couple of separate libraries for separate purposes, like pseudocode.js, and then one for chemistry and physics as well.
Mm, I see. So the task is finding a set of JS libraries that covers "commonly used" LaTeX commands and could be included in the Manubot build script?
Have you compared MathJax and KaTeX? I'm not sure what their relative advantages/disadvantages are at this point.