jupytext icon indicating copy to clipboard operation
jupytext copied to clipboard

Jupyter notebook to emacs' org-mode format

Open mwouts opened this issue 5 years ago • 13 comments

Emacs' org mode format (extension .org) has a well documented syntax for code blocks:

#+NAME: <name>
#+BEGIN_SRC <language> <switches> <header arguments>
  <body>
#+END_SRC

We could implement a notebook to/from org converter.

Further notes:

  • The NAME option seems optional.
  • Examples are available here
  • Org mode has a header with metadata. Can it hold the Jupyter notebook metadata?
  • Also, it's not clear how cell metadata could be represented in org mode

mwouts avatar Sep 09 '18 07:09 mwouts

The second and third links provided by Doug at #61 provide examples of code blocks with cell metadata. The first and fourth link point to ORG to Notebook converters, writen in Emacs Lisp.

Interesting facts:

  • ORG is not markdown! If we want to support styling, round trip conversion is going to be tough!
  • ORG can include results. Currently Jupytext supports no format with outputs. Probably we don't want to implement the matching with Jupyter outputs.
  • More generally, ORG has support for many features that are not in Jupyter notebooks (tables, etc...). Is it acceptable for the users to have them, say, as raw cells in Jupyter?

mwouts avatar Sep 09 '18 20:09 mwouts

Maybe org-mode and jupytext aren't going to be a good match afterall. But my friends are still on the quest for combining emacs + notebooks...

dsblank avatar Sep 09 '18 21:09 dsblank

No problem! @dsblank , we'll try to make your friends happy :smile:

Could you ask them to write a sample org file with text, code, a header, and a few org specific sections, and tell us how they imagine the corresponding notebook? Or even, could they contribute a test similar to test_read_simple_julia.py, but for org mode?

mwouts avatar Sep 09 '18 22:09 mwouts

What‘s the current state with this? Do you already have something to work on? I’d like to help.

srnnkls avatar May 11 '20 14:05 srnnkls

Hello @srnnkls, thanks for reaching out!

Well, I am afraid that we've not made big progresses here... As you saw above, one can use ox-ipynb, by @jkitchin, to convert org-mode documents to ipynb, but I am not aware of a tool doing the opposite conversion.

It would help much if you could write two functions that convert a notebook object to its text representation, and vice versa. Maybe at first we should target a limited conversion, i.e.

  • keep the content of code and markdown cells verbatim in the org-mode representation,
  • and ignore the cell and notebook metadata.

Ideally that first version should be compatible (should use?) ox-ipynb.

These two functions should be called in jupytext.reads and jupytext.writes if the format name matches the name you choose for this format, e.g. "org". You should also add a description of the new format in formats.py.

NB: If you like, you can also provide these two functions in a separate Python package, and add that package as an optional dependency in Jupytext - like we did for the md:pandoc or the md:myst formats.

mwouts avatar May 11 '20 21:05 mwouts

You can use pandoc to convert ipynb to org right now. It is technically possible to do a round trip conversion, but you will lose things like cell metadata, and probably some formatting.

It would not be hard to use elisp to convert an ipynb to an org file containing markdown blocks and code blocks. Also not hard to do that in Python. It might be tricky either way to deal with the results.

jkitchin avatar May 13 '20 14:05 jkitchin

Oh, that's interesting! What we could do, then, is to use ox-ipynb on one side, and pandoc on the other side, plug this into Jupytext's collection of test notebooks, and see how well this work :smiley: (or not). If time permits I will give it a try!

mwouts avatar May 13 '20 16:05 mwouts

Hey, thank's for taking the time to discuss options. Another option for ipynb to org conversion is nbcorg. I'm using nbcorg and ox-ipynb at the moment. That works well but I'd like to have a emacs independet solution that works from jupyter and the command-line. That is what brings me here. As a nbconvert plugin nbcorg uses jinja templates.

I will have a look at jupytext-reads and jupytext-writes. @jkitchin, you are right. I'm not sure about how to deal with results too. nbcorg includes results as EXAMPLE blocks what I'm not a fan about.

srnnkls avatar May 13 '20 19:05 srnnkls

Okay, I just realised that for markdown you use pandoc under the hood as well. I will write a round trip conversion test first, then. I don't know when I find the time to work on this but probably sometime within the next week.

srnnkls avatar May 13 '20 20:05 srnnkls

@srnnkls , if that can help, I have prepared a branch with a tentative implementation of the org format based on pandoc (back and forth), see the last three commits here: https://github.com/mwouts/jupytext/commits/org_pandoc:

  • The first commit https://github.com/mwouts/jupytext/commit/086178b8390d52e684c9e678e16a5f3cff2ea4d4 adds the org:pandoc format to Jupytext (and simply calls pandoc)
  • The second commit https://github.com/mwouts/jupytext/commit/93f6478ebd7b979b178e2772a3338db0e5f4f119 activates the round trip tests on that format
  • And the third commit https://github.com/mwouts/jupytext/commit/b5432c7aea2e535cb5f83efc50a71d2d525f4d13 adds the files generated by the round trip tests

Note that the round trip test does not work. Is it correct that pandoc's conversion only works in one direction (ipynb to org)? Any way, feel free to experiment with this, and replace either converter with your favorite one.

Two additional comments:

  • Jupytext removes the outputs before calling pandoc (because they are preserved in the .ipynb file), so no headache with outputs...
  • pandoc is used for the md:pandoc format, but that is not Jupytext's default markdown format (see the example files at https://github.com/mwouts/jupytext/tree/master/demo )

mwouts avatar May 13 '20 22:05 mwouts

Wow, thank you! I‘ll have a look. Removing results before converting to text and just adding them back to the notebook version is a nice approach; good to know about it.

srnnkls avatar May 14 '20 06:05 srnnkls

Just a note for the people who subscribed to this thread... I have opened an issue at pandoc regarding the round trip ipynb-org-ipynb: https://github.com/jgm/pandoc/issues/6367.

The issue also raises the question of how the notebook cells should be represented in org mode. Personally I think that the representation should remain as simple as possible, because the users are going to type it :smiley: But obviously, it is simpler for the programmers (and maybe also safer in the long run?) to use explicit cell markers. Anyway... if you have an opinion about this, please follow the pandoc thread as well!

mwouts avatar May 16 '20 21:05 mwouts

Recent developments on the Pandoc side of things: https://github.com/jgm/pandoc/issues/6367#issuecomment-1222188309

dlukes avatar Sep 07 '22 09:09 dlukes