nbformat icon indicating copy to clipboard operation
nbformat copied to clipboard

Spec out the markdown format used

Open jasongrout opened this issue 8 years ago • 13 comments

As part of the notebook format, we should clearly specify what format we use for markdown. For example, we use markdown (as implemented by marked), plus various things from GitHub-flavored markdown, plus a specific configuration of MathJax.

A change to the markdown format then is really a change in nbformat spec, with hopefully some sort of automatic conversion, if possible.

CC @rgbkrk, @mpacer

jasongrout avatar Jan 17 '17 17:01 jasongrout

To add to the outline from the call today, nbformat v1-v4 has been what marked supports + inline and block maths using $ and $$ respectively. If we transition to a more strict markdown and know how to upconvert, we'd specify it for v5 of the notebook.

rgbkrk avatar Jan 17 '17 18:01 rgbkrk

I have witnessed a very long discussion about latex support in gitlab. The conclusion was that the inline math is $`x`$, and block math uses

`​`​`​math
x
`​`​`​

While these are unconventional in math world, I have to admit that these form the cleanest extension of markdown, maintain a lot of compatibility, and offer the cleanest implementation without interfering with the internals of markdown engine. Both are also very domain-specific and shouldn't break existing texts.

akhmerov avatar Jan 17 '17 20:01 akhmerov

As part of the notebook format, we should clearly specify what format we use for markdown. For example, we use markdown (as implemented by marked), plus various things from GitHub-flavored markdown, plus a specific configuration of MathJax.

Also, header id autogeneration (which is not a core feature of markdown) , the attachment spec, mixed support for \[…\] \(…\) display and inline math formatting (some aspects of it work some don't work as well because of things needing to be escaped that wouldn't normally need escaping), & I'm sure there are many other things that I'm not thinking off of the top of my head.

mpacer avatar Jan 17 '17 20:01 mpacer

We discussed this at the in person team meeting last week and charted out the following path for this issue:

  • Begin by documenting and testing the precise details of the Markdown syntax that we support in the classic notebook. This will be done by first creating a sample notebook of all possible markdown we support, and then creating documentation in nbformat about that spec.
  • Treat that markdown syntax as a formal part of the notebook document format.
  • Use that current state as the beginning point for further work.
  • Initially, we will have a single "Jupyter Markdown" syntax for markdown in notebooks and in standalone markdown files, but over time we can split off additional things supported in standalone files or in nbconvert.
  • We also decided that @mpacer will work with Cal Poly intern @ashutoshbondre on this task.

I will open up more specific issues about the individual tasks and also create a GitHub Project for this work to help us track it.

ellisonbg avatar Jun 08 '17 19:06 ellisonbg

Hi everyone!

I would like to rekindle this discussion and propose a rather different approach.


I believe one of the big problem with markdown as a markup language is that its dialects aren't clearly defined, and those that are only allow for a limited amount of extensibility. Therefore I have by now lost hope that one markdown to rule them all will emerge.

To name an example from a closely related, a recent book on teaching with jupyter is written in bookdown, itself an extension of R-markdown that implements different aspect that are relevant to book authoring.

Another important markup language from the Jupyterverse is restructured text: a lot of documentation for Jupyter projects is written in it.

Finally I should also mention asciidoc, which is also a pretty decent markup language.

So perhaps instead of deciding upfront on the one Jupyter markdown spec, an alternative approach is to extend the notebook format to support different markup languages and flavors?

Displaying those would then be a responsibility of the server or notebook or lab extensions.

akhmerov avatar Jan 25 '19 15:01 akhmerov

...and then we'll see a latex notebook using assembly kernel :)

akhmerov avatar Jan 25 '19 15:01 akhmerov

I believe one of the big problem with markdown as a markup language is that its dialects aren't clearly defined

We could introduce metadata saying what the markdown format is, but we still face this problem - what does a dialect name mean if it isn't precisely defined?

We're also facing a problem in this area right now - marked.js, the renderer we use in the classic notebook and lab, is slowly evolving to be commonmark compliant, which means that it is changing over time. So regardless of backwards compatibility that we want, as we upgrade marked for security fixes, etc., we are breaking backwards compatibility (hopefully just in edge cases) with previous notebooks.

All that said, +1 for adding a notebook metadata field, right alongside the kernel metadata, specifying a markup name and version.

jasongrout avatar Jan 25 '19 16:01 jasongrout

We could introduce metadata saying what the markdown format is, but we still face this problem - what does a dialect name mean if it isn't precisely defined?

Indeed, I didn't mean to say that this would remove the need to formalize the markdown format, but it would remove the need to decide on a specific one, as well as refine/extend the implementation incrementally.

akhmerov avatar Jan 25 '19 16:01 akhmerov

Supplying a markup name and version seems like an intractable problem for frontends. We couldn't reasonably support all of them between all client machines, many times having to use the fallback default format which is what this issue is for.

rgbkrk avatar Jan 25 '19 16:01 rgbkrk

But wasn't this a similar situation with kernels before and with mime rendering extensions in JLab now?

akhmerov avatar Jan 25 '19 22:01 akhmerov

I guess it all depends on what "front end" means. The same DOM doesn't generate the same pixels on two versions of the same browser on the same computer, much less any larger Cartesian product. Same can be said for PDF (fonts, forms, js shudder).

But let's say "same DOM minus CSS is good enough"... the fallback renderer could actually be server-side: say, a versioned, standalone configuration of mistune (or better still, pandoc), passed through a versioned html tidier... with a conformance suite. Any last-mile pixel-pushing front-end can request a rendering, returned as a mime bundle, and not even worry about it, in exchange for a rest call (or a comms message) and some milliseconds.

Then nbformat could extend markdown_cell to have outputs, so you can optionally store an "archival" version. Ahh, but the CSS! (see table/equation alignment issue of late). Going further towards the archival goal, you could also include CSS and render in an iframe. Or PDF with embedded fonts. For equations, stop relying on mathjax, and just push svg (Wikipedia does this).

OR a front-end can boldly attempt to implement the spec, and save the user minutes/megabytes over the course of a session. As jupyter, we'd want to offer a reference js implementation, if only to demonstrate how you'd validate against the oracle. Oh, and also offer a wysiwyg (prosemirror FTW).

Other frontends would be free to use these implementations, roll their own (and validate against the spec), or just vendor in a wasm pandoc... only a little facetiously... Right?

Of course this machinery would lend itself to other typesetting implementations, configuration, etc.

I actually really like the comms idea: Starting to sounds like a language server protocol kind of thing, but we could handle the simple case of markdown ref-style links in different cells... all the way up to multi-document goodies like rst toctree. Because none of this ever touches a kernel, it's free to be aware of its relative location (on "disk"), and definitely doesn't have to be sequential like code.

Or, the typesetting "notebook kernel" could actually do things like cell reuse, but sounds like we've gone full Knuth at that point...

bollwyvl avatar Jan 26 '19 00:01 bollwyvl

Continuing Re: @rgbkrk. This issue proposes to cement a markdown spec in the notebook format in addition to defining the spec. While defining the spec is indeed unavoidable, my proposal is to separate that from the notebook format itself, and therefore it is relevant to this issue. I'm also unsure whether "Jupyter markdown" should be more of a default than IPython3 kernel being the default on out-of-the-box installations of Jupyter.


I'd also like to link some related developments that in my opinion indicate that such a generalization of notebook format is reasonable:

  • https://github.com/jupyterlab/jupyterlab/pull/5901 makes markdown renderer a proper configurable JLab extension.
  • https://github.com/mwouts/jupytext already converts rst from sphinx gallery to a notebook, and has an open issue for orgmode.
  • While not nearly as much of a development, I've been personally using Python-markdown with extensions to author lecture notes for my course. With a more flexible markdown spec I could also offer and author these materials as a notebook.

akhmerov avatar Jan 26 '19 09:01 akhmerov

This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/inline-variable-insertion-in-markdown/10525/95

meeseeksmachine avatar Sep 15 '21 15:09 meeseeksmachine