seaborn
seaborn copied to clipboard
Add pre-commit rule for clearing notebook output
This would be helpful to prevent the notebooks that comprise the source for most of the docs from getting committed with big plots in them. Additionally, it would be helpful to clear out some of the volatile metadata that isn't important (what Python version most recently executed the notebook, some hash that I don't understand the purpose of, etc.)
@stefmolin do you know if there's an existing hook for this?
do you know if there's an existing hook for this?
fyi I've used nbstripout in the past and it sound like what you're going for.
heard of it ;) https://github.com/mwaskom/seaborn/blob/02df7590c70ca8f68282ed941b98410580c55aa9/doc/tools/nb_to_doc.py#L4
I would prefer that the notebook cleaning be more aggressive than what currently happens, although I've not checked recently as to whether the nbstripout tool has progressed since I borrowed its code. Switching to using it as a pre-commit check makes sense.
Another option would be to use something like jupytext and only keep them in markdown (pandoc, MyST, quarto, etc.) format. I've had good luck with using it, personally.
Had that thought too ;)
https://github.com/mwaskom/seaborn/issues/2635
I wasn't aware of any until now. I have nbstripout working as a pre-commit hook to clear the output and volatile kernel and Python version information. Assuming I'm understanding which hashes you are referring to, those are cell identifiers. They don't change. I have some processes where I combine multiple notebooks together and then the hashes come into play (collisions cause issues).
One annoyance I foresee is that people are unlikely to manually remove the metadata themselves before committing, so even if they remember to clear the cells, they will have to git commit twice: once to have the file "fixed" and the second to actually commit the changes. A way around this could be to add a "quiet" option to nbstripout if they are open to it, or we could build a thin wrapper around it that would allow us to suppress the output so it doesn't "fail" during the commit (i.e., it just updates it for you and the commit succeeds as long as nothing else failed).