nb-clean icon indicating copy to clipboard operation
nb-clean copied to clipboard

Filter cleans python version metadata

Open Nicolae93 opened this issue 2 years ago • 3 comments

The filter nb-clean add-filter --preserve-cell-metadata cleans the python version at the end of the notebook. This causes a metadata misalignment between local git and github notebooks.

- "pygments_lexer": "ipython3",
- "version": "3.8.8"

+ "pygments_lexer": "ipython3"

Every time that I open a notebook after pushing with the filter, I get my notebook modified. It is possible to fix that? Thanks in advance!

Nicolae93 avatar Aug 24 '21 10:08 Nicolae93

nb-clean removes the Python version from the global metadata by design. Do you have a specific use case where maintaining this metadata is required?

Every time that I open a notebook after pushing with the filter, I get my notebook modified

Can you share what commands you're running, and how you installed and configured nb-clean? When using the Git filter integration, notebooks are cleaned as they're added to the index prior to recording a commit, but the working copy shouldn't be modified (unless you've also run nb-clean clean yourself, outside the Git filter).

srstevenson avatar Aug 30 '21 14:08 srstevenson

Hi! I activated the filter on a specific git repo with the following command: nb-clean add-filter --preserve-cell-metadata --remove-empty-cells. I would prefer that --preserve-cell-metadata does not touch at all the notebook's metadata, including the python version.

I introduced this library to my team that works fine individually but I've got some complaints about the forced commitments of some untouched notebooks.

Maybe the issue arises when some of my team does not use the library. So they push the dirty metadata notebook on GitHub and we end up cleaning every time their mess.

Nicolae93 avatar Sep 07 '21 11:09 Nicolae93

Then, --preserve-notebook-metadata is needed? @Nicolae93

yasirroni avatar Aug 29 '22 04:08 yasirroni

nb-clean removes the Python version from the global metadata by design. Do you have a specific use case where maintaining this metadata is required?

I think this behavior is rather strange. nb-clean should either always clean all notebook metadata or not touch it at all. Why just cleaning the Python version and not the other?

In my case, VSCode add VSCode meta data. Also, the kernelspec are modified. It will cause git diff in environment with many developers.

yasirroni avatar Oct 17 '22 14:10 yasirroni

Why just cleaning the Python version and not the other?

This was originally implemented because it's common that different contributors to a project will be using different versions of Python (perhaps just differing by patch version), which leads to spurious diffs when different contributors alternatingly commit a notebook. The other metadata fields are (or at least were at the time of nb-clean's inception) less likely to change with each contributor.

srstevenson avatar Oct 27 '22 20:10 srstevenson

Why just cleaning the Python version and not the other?

This was originally implemented because it's common that different contributors to a project will be using different versions of Python (perhaps just differing by patch version), which leads to spurious diffs when different contributors alternatingly commit a notebook. The other metadata fields are (or at least were at the time of nb-clean's inception) less likely to change with each contributor.

Can we change the default behaviour? Because it is also common for different developer to use different notebook:

At least I propose below notebook supports:

Colab VSCode Datalore Jupyter Notebook Jupyter Lab

Opening, running, and saving (without editing) in all five notebook should produce same metadata.

yasirroni avatar Oct 27 '22 22:10 yasirroni

This issue was closed due to inactivity. Please reopen if still relevant.

github-actions[bot] avatar Aug 29 '23 03:08 github-actions[bot]