enhancement-proposals
enhancement-proposals copied to clipboard
JEP: Add a dirty state to code cells in notebook format
Originally suggested by @davidbrochart in https://github.com/jupyter/nbformat/issues/222 and related to the work done in JupyterLab here: https://github.com/jupyterlab/jupyterlab/pull/10296.
Reproducibility is at the heart of Jupyter, but some Notebooks workflows can harm reproducibility. As an example, a Notebook user can make a code cell, execute it, which generates some output, and re-edit the code cell without generating the new output, then save. In this case, the saved Notebook is in a dirty state, where the cell input does not reflect its output.
I suggested a UI change in JupyterLab: https://github.com/jupyterlab/jupyterlab/pull/10296 which shows a visual indication that the cell has been re-edited since the last run, showing that the cell is "dirty":

Discussing with @davidbrochart, we wondered if this should not be included as part of the Notebook format taking the form of a new entry in the cell format. This would give a clue that the output may not reflect the code input execution result.
I think this is a great idea from a user-facing perspective.
also I hope that it’s ok, I updated the title to make it a bit clearer that we are talking about the notebook format in general, not nbformat the python package
also I hope that it’s ok, I updated the title to make it a bit clearer that we are talking about the notebook format in general, not nbformat the python package
Isn't nbformat the place where the specification is implemented? It is unclear to me where this change could happen. I thought nbformat was the first place to look at, then Jupyter front-ends could implement it later.
I think you're right but I'm not positive either - I just felt like notebook format helped disambiguate a bit :-) if you want to change it back that's fine too
An alternate approach to consider is to build on https://github.com/jupyter/enhancement-proposals/issues/68 and persist the Cell ID in the IPython history then on connection to the kernel fetch the execution history and see if the executed code differs from the source.
Colab has been sending cell IDs for a while and subclasses HistoryManager to persist the cell ID- https://github.com/googlecolab/colabtools/blob/main/google/colab/_history.py. We then use this to populate the cell's execution_count in the UI when reconnecting to a kernel.
I think it'd also be useful to populate execution timing info from the kernel when connecting to a runtime. There's an awkward discontinuity when connecting to a kernel where the execution state of a cell in a notebook may be different than the kernel.
Interesting thought on using the cell IDs @blois. Could we take it a step further, and extend kernel execution response (execute_result) to include an optional "depends_on": list[cell id] which would then be used by frontends to implement something like https://github.com/nbsafety-project/nbsafety. This would likely be another JEP (unrelated to having the dirty state as this stands as a feature of its own). Does this idea make sense?