pandoc icon indicating copy to clipboard operation
pandoc copied to clipboard

wishlist: server-side KaTeX

Open zackw opened this issue 5 years ago • 12 comments

KaTeX can be used to render math to HTML in advance. You need node.js and the katex command line tool at render time, but then no JavaScript needs to be executed in the browser (a CSS file is still necessary). I've written a proof-of-concept Lua filter that implements this mode in Pandoc and it works reasonably well:

-- Pandoc filter: if we are generating HTML, replace each Math element
-- with the result of running KaTeX on its contents.  This requires
-- the command-line "katex" program to be installed at rendering time,
-- but does not require any JavaScript to be executed on the reader's
-- browser.  (The built-in --katex mode makes the opposite tradeoff.)
if FORMAT:match 'html' then
   have_math = false

   function Math(elem)
      local function trim(s)
         return s:gsub("^%s+", ""):gsub("%s+$", "")
      end

      have_math = true

      local katex_args = {'--no-throw-on-error'}
      if elem.mathtype == 'DisplayMath' then
         table.insert(katex_args, '--display-mode')
      end

      return pandoc.RawInline(
         FORMAT, trim(pandoc.pipe("katex", katex_args, trim(elem.text))))
   end

   function Meta(data)
      -- The "has_math" property will be absent when there is no math
      -- and the string "true" when there is math.
      if have_math then
        data.has_math = "true"
      end
      return data
   end
end

Due to the need to start a fairly heavyweight program (the node.js interpreter) for every math element, though, it's quite slow. I'm going to experiment with using a JSON filter written in node.js instead, but I wonder whether native support for this mode in Pandoc might be even faster -- it could fork off the katex utility the first time it ran into a math element, and pipe it the math markup as it arrives. Native support would also allow --standalone to know when it should link to the KaTeX CSS instead of the JS.

zackw avatar Aug 30 '20 21:08 zackw

it could fork off the katex utility the first time it ran into a math element, and pipe it the math markup as it arrives

Isn't this something that could, in principle, be done in a filter too?

jgm avatar Aug 30 '20 21:08 jgm

I don’t think this is currently possible from a Lua filter; pandoc.pipe can only run a subprocess synchronously to completion AFAICT. It’d need to be more like python’s subprocess.Popen and frankly I’m not sure that would be worth the effort.

A JSON filter certainly can do this but then the question is whether the JSON serialization overhead eats all the performance gain from only invoking the node interpreter once.

zackw avatar Aug 30 '20 22:08 zackw

Doesn't lua have primitives allowing you to open a subprocess and pipe things to it? I may be missing something, @tarleb can comment more helpfully.

jgm avatar Aug 30 '20 22:08 jgm

There are no built-in primitives, but there are libraries which provide this functionality. E.g., the posix library comes with posix.popen, posix.spawn, posix.sys.wait, etc.

This requires pandoc to be compiled against the system's Lua installation. Distro-packages usually do this, as do the official Docker images. The Lua posix package can conveniently be installed via the distro's package manager.

tarleb avatar Aug 31 '20 05:08 tarleb

I'm going to experiment with using a JSON filter written in node.js instead

yes, sounds like that's the way to go here. you'll need node.js anyway to run in, so no particular advantage in using a lua filter here.

btw. see also https://pandoc.org/MANUAL.html#math-rendering-in-html for built-in alternatives

mb21 avatar Aug 31 '20 06:08 mb21

Well, Lua filters should be a bit faster, and more so if there are only a few math elements in a very long document. But I agree that using a node.js filter is probably the better approach here.

tarleb avatar Aug 31 '20 07:08 tarleb

Another option: set up a server that does the conversions, and use --webtex (pointing at this server) and --self-contained. Then you don't need a filter at all.

jgm avatar Sep 01 '20 00:09 jgm

Sounds a lot like this filter!

MyriaCore avatar Sep 03 '20 21:09 MyriaCore

There is also https://github.com/lierdakil/mathjax-pandoc-filter. It's written in TypeScript so it doesn't fork for every equation. Not sure how MathJax compares with KaTeX, though.

averms avatar Sep 22 '20 18:09 averms

For those worried about starting a new process for every math element, I came up with a very over-engineered solution for my project. The Lua filter uses luaposix to communicate over a Unix socket with katex.ts, a server running on Deno. With Deno there's no package.json, node_modules, etc., just a single TypeScript file. It works nicely since KaTeX provides an ECMAScript Module.

mk12 avatar Jan 11 '21 06:01 mk12

Writing a node.js-json-filter is probably still the way to go here to transform the HTML to what KaTeX CSS needs (so you don't have to run JS in the browser). But we could of course reimplement this in Haskell in the HTML writer... I'm not saying this is worthwhile, but we can keep the issue open for that feature request.

mb21 avatar Apr 18 '21 07:04 mb21

I came up with another solution: invoke JS from the Lua filter, but batch everything into one invocation. Comparison:

  • zackw's original comment: Use pandoc.pipe from Lua filter
    • pro: Easy to write
    • con: Very slow for lots of math
  • mb21's suggestion: JSON filter in JS
    • pro: Can use KaTeX API directly
    • con: Less powerful than Lua filters: can't access pandoc.read, pandoc.write, etc.
  • My earlier idea: Communicate with JS server using luaposix
    • pro: Avoids per-math process overhead
    • con: Lots of boilerplate to write server correctly
    • con: Hard to access LuaRocks from Pandoc Lua
      • Only works in dynamically linked pandoc
      • In some cases must configure LUA_PATH and LUA_CPATH
  • My new idea: Invoke JS from Lua filter, but only once for all math
    • pro: Avoids per-math process overhead
    • pro: Much simpler JS script than using sockets

math.ts:

import { readLines } from "https://deno.land/[email protected]/io/mod.ts";
import katex from "https://cdn.jsdelivr.net/npm/[email protected]/dist/katex.mjs";

for await (const line of readLines(Deno.stdin)) {
  try {
    console.log(katex.renderToString(line, {
      displayMode: false,
      strict: "error",
      throwOnError: true,
    }));
  } catch (error) {
    throw new Error(`Input: ${line}\n\nError: ${error}`);
  }
}

filter.lua:

function Pandoc(doc)
    -- We have to use a temporary file because Lua does not support
    -- bidirectional communication with a subprocess:
    -- http://lua-users.org/lists/lua-l/2007-10/msg00189.html
    local tmp_name = os.tmpname()
    local math = assert(io.popen("deno run math.ts > " .. tmp_name, "w"))
    doc:walk({
        Math = function(el)
            assert(math:write(el.text:gsub("\n", " ") .. "\n"))
        end
    })
    math:close()
    local tmp = assert(io.open(tmp_name, "r"))
    doc = doc:walk({
        Math = function(el)
            return pandoc.RawInline(FORMAT, tmp:read())
        end
    })
    tmp:close()
    os.remove(tmp_name)
    return doc
end

mk12 avatar Apr 14 '22 23:04 mk12

Here's an existing nodejs KaTeX filter that I've started using and it is working well so far: https://github.com/StevenZoo/pandoc-katex-filter https://npm.io/package/pandoc-katex-filter Much faster than the Python version. Thank you @StevenZoo!

castedo avatar Oct 13 '22 15:10 castedo