nbsphinx icon indicating copy to clipboard operation
nbsphinx copied to clipboard

feat: change markdown standard to GFM

Open tachyonicClock opened this issue 1 year ago • 5 comments

Change markdown standard to Github Flavored Markdown (GFM) replacing Pandoc's default markdown. This is consistent with the Jupyter Notebook markdown cells and should improve compatibility (https://nbformat.readthedocs.io/en/latest/format_description.html#markdown-cells).

tachyonicClock avatar Sep 04 '24 22:09 tachyonicClock

Hello @tachyonicClock! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 1044:80: E501 line too long (104 > 79 characters)

Comment last updated at 2024-09-04 22:52:54 UTC

pep8speaks avatar Sep 04 '24 22:09 pep8speaks

Thanks for this PR!

I wasn't aware that Jupyter uses GFM. First I thought this was changed recently, but apparently this has been supported for a long time!

So I'm all for also using it in nbsphinx, but this PR seems to break a few things:

  • LaTeX environments and macros like \ref{}
  • HTML elements
  • "warning" box within "note" box

This might not be a full list, for now I only looked at markdown-cells.ipynb.

mgeier avatar Sep 14 '24 19:09 mgeier

Drive-by comment: have you considered also making a similar PR in jupyter nbconvert ? Both projects unfortunately do not share the pandoc invocation, but I think most users would expect them to behave the same way (at least I did).

douglas-raillard-arm avatar Sep 17 '24 09:09 douglas-raillard-arm

In an effort to close my outstanding pull requests, I took another look at this. Unfortunately, despite what nbformat says about markdown being Github-flavoured markdown (GFM) as implemented in marked.js, the truth is that Jupyter Lab uses a hybrid of GFM and a custom parser that identifies equations.

In summary this is difference between GFM and Jupyter's handling of equations:

  • GFM supports maths via $...$, $$...$$, and ```math\n...\n``` (https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/writing-mathematical-expressions).
  • Jupyter lab supports \begin{...}...\end{...} $...$ $$...$$ \\(...\\) \\[...\\].

The relevant source code is in renderMarkdown:

// Separate math from normal markdown text.
const parts = removeMath(source); // <- This is their custom parser that identifies equations.

// Convert the markdown to HTML.
html = await markdownParser.render(parts['text']);

// Replace math.
html = replaceMath(html, parts['math']); // <- This adds the equations back with markers for MathJax

Equations in Different Markdowns

Given that Jupyter deviates from GFM how do different markdown/notebooks handle equations. I used this file to test them.

Github GFM Pandoc GFM Jupyter Lab PyCharm Notebook VSCode Notebook Pandoc Markdown NBSphinx NB Convert
GFM Markdown
$...$
$$...$$
\\(...\\)
\\[...\\]
\begin{...}...\end{...}
```math\n...\n``` ❌ (fails to generate)
file image image image image image image image image
pandoc --from markdown equations.md -o pandoc_markdown.pdf
pandoc --from gfm equations.md -o pandoc_gfm.pdf

The "GFM Markdown" row indicates whether non-equation elements render like GFM. It is important because, as you can see, the way lists render is quite different.

What do we do?

Its all a bit of a mess. My opinion is that nbsphinx should render like the editors that people are writing notebooks in (VSCode Notebooks, Jupyter Lab, and PyCharm Notebooks). Therefore, it is essential we use GFM markdown but support Jupyter's equation delimeters. (Some of nbsphinx's latex like features would break using pandoc gfm as our backend e.g. \ref{})

Maybe we could add a "jupyter-flavoured-gfm" extension to pandoc or add a pre-processing step that converts Jupyter style equations to GFM compliant ones. Either approach would need to match Jupyter's removeMath.

tachyonicClock avatar Oct 02 '25 03:10 tachyonicClock

Thanks for researching all this, this is a really nice overview!

Its all a bit of a mess.

I agree.

My opinion is that nbsphinx should render like the editors that people are writing notebooks in (VSCode Notebooks, Jupyter Lab, and PyCharm Notebooks). Therefore, it is essential we use GFM markdown but support Jupyter's equation delimeters.

I agree.

(Some of nbsphinx's latex like features would break using pandoc gfm as our backend e.g. \ref{})

Maybe, but hopefully we find a work-around. Currently it also somehow works together with pandoc.

Maybe we could add a "jupyter-flavoured-gfm" extension to pandoc ...

I think this would be a good idea for pandoc in general, however, on the long run I would really like to get rid of the pandoc dependency, see #36.

or add a pre-processing step that converts Jupyter style equations to GFM compliant ones.

Pre-processing is an option, but if possible, I would prefer having a pure Python Markdown parsing library with a proper extension system where we could implement Jupyter Markdown (or ideally, such an extension would already exist).

This library would need to be able to append to an existing docutils document (because it would be called repeatedly, once for each Markdown cell).

mgeier avatar Nov 24 '25 18:11 mgeier