ex_doc icon indicating copy to clipboard operation
ex_doc copied to clipboard

Preprocess .livemd documents

Open josevalim opened this issue 3 years ago • 2 comments

Unfortunately Markdown documents try to escape backslashes. Which means that a Katex formula like this:

$$
\begin{bmatrix}
  1 & 2 \\
  3 & 4
\end{bmatrix}
$$

Will appear in the HTML as:

$$
\begin{bmatrix}
  1 & 2 \
  3 & 4
\end{bmatrix}
$$

Which will render the formula without the newline. However, Livebook handles this case correctly. This means that rendering Livebooks as Markdown documents differ within Livebook and using ExDoc.

@jonatanklosko, how hard do you think it would be to add some pre-processing here? Although I think the big challenge will be really to preprocess $? 🤔

The only other option I can think about is to change the notation from $$ to something that includes ` in it, such as $`..`$ or `$...$`.

josevalim avatar May 08 '22 20:05 josevalim

To clarify, this problem is not specific to .livemd files, since we use math in docstrings too.

In Livebook we handle it by parsing with parse_inline: false, however when parsing the document as a whole we are back to a the issues, such as underscore italic in Paragraph content with math $x_{1} x_{2}$.

What delimiters should we actually use?

LaTeX standard delimiters for math are \( \) for inline and \[ \] for display (ref). Dollars are discouraged, to the point where $$ isn't even listed in the Overleaf guide. However, for Markdown the standard delimiters are confusing, because the backslash is used for escaping, so it's not obvious if it should be \( or \\(.

Pandoc uses $ and $$ (ref). It also has an option for treating either \( or \\( as math delimiter. Jupyter uses dollars (ref), although \\( is also supported. StackExchange uses dollars too (ref).

GitLab uses code snippets instead (ref), as in:

Inline $`a^2+b^2=c^2`$.

```math
a^2+b^2=c^2
```

However this seems to be an outlier.

EDIT: GitHub just added support for math syntax, and they also use $ and $$ (ref).

All that said, dollars seems like the most reasonable syntax.

How can we properly support dollars?

I think the only option is to parse them in EarmarkParser. We considered it in the past for Livebook, the parse_inline: false option did the trick, but we can't use it for ex_doc. The question is how do we want to represent math nodes in the AST. We could make them div/span tags with math classes, however this probably implies a more manual usage of KaTeX.

jonatanklosko avatar May 09 '22 12:05 jonatanklosko

GitHub officially supports $ and $$, so we can likely request the same for Earmark: https://github.blog/2022-05-19-math-support-in-markdown/

josevalim avatar May 20 '22 03:05 josevalim