notebook icon indicating copy to clipboard operation
notebook copied to clipboard

Suggestion: Separate file for notebook executed cell outputs.

Open jbursey opened this issue 5 years ago • 11 comments
trafficstars

Unless this is a feature already I think it would be nice to have a separate file (something like .ipynb.output) that links output to their cells in the .ipynb json file. This would make it significantly easier to exclude notebook outputs in source control systems like git.

If this is already possible somehow I would be interested to know.

jbursey avatar Aug 12 '20 20:08 jbursey

Its not a bad idea. But if keeping cell output out of source control is your primary concern, the easiest solution is to just clear the outputs before committing. There are a few ways to do that:

  1. Use a commit hook as outlined in Jupyter docs.

  2. Use Jupyter's shortcut to "clear all cell output"

  3. Use nbconvert to clear the notebook outputs before committing.

  4. You could also just write your own shell script to clear outputs. I wrote one using jq to do that and it is fairly easy.

Some folks also choose to just convert the notebook to python using nbconvert and then just commit that. If you search for "How to version control jupyter notebooks" you will see a bunch of posts on the topic.

gitjeff05 avatar Aug 17 '20 01:08 gitjeff05

I think that jupyterlab already has the capability of displaying the output in a different view from the notebook.

cipri-tom avatar Oct 16 '20 16:10 cipri-tom

Alternatively, Jupytext could be helpful for your case. It allows you to save notebooks as code. Then you only need to commit the code to git, whilst you can ignore the notebooks for version control.

Their paired notebooks avoid the need for automatically saving and converting the notebooks.

IvoMerchiers avatar Mar 19 '21 14:03 IvoMerchiers

Related: jupyterlab/jupyterlab#9444 and jupyterlab/jupyterlab-git#392

Related question on Stack Overflow: How can I configure my tools to ignore or prevent updates to the execution_count field in a Jupyter Notebook from being tracked in git?

starball5 avatar Feb 27 '23 00:02 starball5

Good idea. The alternative discussed above are about excluding cells from source control.

But sometimes we have a need to include the executed cells in source control. (My current case is with Quarto.) Including the cell output in the .ipynb file makes it extremely difficult to review/diff a plaintext. This experience would be improved a lot if the input and output could be separated. A reviewer would then be able to decide whether the changes was cause by code change, or purely external changes and rejection of the notebook.

th0ger avatar Aug 26 '23 09:08 th0ger

This feature would be very helpful for cases where execution is time-consuming, or relies on the availability of input data or tricky code dependencies. With separate output, the .ipynb.output file could be managed with (eg) git LFS, making the .ipynb diffs easy to review and still allowing retension and versioning of the output.

alexbjorling avatar Nov 01 '23 13:11 alexbjorling

@alexbjorling LFS is a good point. Notebook output is very suitable for LFS, but input cells are not.

th0ger avatar Nov 07 '23 07:11 th0ger

I think cleaning the notebook can only be seen as a workaround.

Tyrrx avatar Nov 21 '23 10:11 Tyrrx

Yes, this would be a huge improvement. I believe this is why Quarto embeds Python in Markdown as a "plain text representation of notebooks."

If the .ipynb itself could be in a readable plain-text format, and the outputs stored in a separate file, that would:

  • Make diffing of notebook code trivial.
  • Make editing (the code of) a notebook easy using any text editor.
  • Allow the output to be versioned using a non-plaintext scheme, e.g. Git LFS as mentioned above, or being snapshotted only periodically as opposed to on every commit.

zmbc avatar Feb 21 '24 23:02 zmbc

Hugely in support of this! Even if it isn't a default behavior, it would be amazing to have the option.

carschandler avatar Apr 04 '24 14:04 carschandler

Surprised not to see anyone mention this yet, this jupyter extension does almost exactly what this thread describes: https://jupytext.readthedocs.io/en/latest/paired-notebooks.html

zmbc avatar Apr 24 '24 17:04 zmbc