jupytext
jupytext copied to clipboard
Pre commit hook workflow
I'm trying to set up a pre-commit hook workflow following the docs but having some issues with what I have in mind. The workflow I'm immagining right now is as follows:
- Make changes to notebook
- add notebook to staging area
- commit => before committing the pre-commit hook runs converts the notebook to .py adds it to the commit and then proceeds to committing
From reading the issue where support for the pre-commit framework was added, it looks like that the problem is in the last point: the pre-commit hook is supposed to generate the .py file but not add it to the staging area.
My issue with this behaviour is that this way I always have to make tha same commit twice, once for the notebook, once for the .py. Indeed, if I'm working on the notebook file and not touching the .py, from one commit to the next the .py is going to be the same so git won't event let me add it to the staging area. Then, when committing the first time around, the jupytext pre-commit hook fails telling me that the git index is outdated
and asking me to add the .py to the staging area; then running the commit a second time adding the generated .py everything works fine.
My question is, is this the intended behaviour or am I missing something?
Hi @MicheleQuasar , well that is a good question.
I think this is the intended behaviour - at least this also appears in our tests.
However I agree with you that having to add the paired file to the index makes the pre-commit workflow a bit more complex to use than standard hooks. As you state, trying to commit twice is not enough, unlike for most hooks.
I just had a look at the history of our work on the workflow, and in that comment we concluded that adding the paired file to the index was not the right approach. I think the reason is that if we add the file in the hook itself, then we don't have a guarantee that it passes the other pre-commit hooks.
Maybe I'll double check with @JohnPaton and @Skylion007 if my understanding is correct? Also, is anyone aware of other pre-commit hooks that like Jupytext might modify files outside of the current git index (and if so, how did they address this question?)
This is correct and the intended behaviour. pre-commit
themselves promise to never modify the staging area - the user needs to take responsibility for any code changes that are being committed, essentially. See for example this issue
Thanks for the clarification @mwouts and @JohnPaton. The choice of not committing anything that hasn't been checked by a human makes sense in the general case. It can make the workflow a bit more cumbersome, like in this instance, but is for sure the safest!
@mwouts would it make sense to add something about this in the docs?
Agreed, I admit once I trust a hook I commit its output pretty uncritically - but at the end of the day it's still my name on the commit.
I'm sorry for coming back on an old thread, but this has left me confused: what is the purpose of the jupytext
pre-commit
hook then?
I mean, I think I'm getting something wrong here, but in my mind one of the main use cases (and the one I'm interested in at the moment) of jupytext
is versioning exclusively .py
files and no .ipynb
ones. However, the pre-commit
hook forces its user to stage the .ipynb
files and does not remove them from the index afterwards (for the reasons you mentioned). The conclusion I've arrived at is if I wanted to only version .py
files generated from jupytext
, I'd be better off running jupytext
manually before each commit instead of using the hook, which does not make sense to me.
The workflow I'm imagining is something like
git add *.ipynb
git commit ... # Runs hook and e.g. updates .py files using `jupytext --sync`
git rm --cached *.ipynb # Have to manually remove files before commiting again
git commit ...
whereas running manually you'd only have to do
jupytext --sync *.ipynb
git add *.py
git commit ...
Is there another use case for this hook in its current form that I'm missing?
Hi @tomaz-suller , thank you for your input.
The use case for which the jupytext
pre-commit
hook was designed was to make sure both text and ipynb notebooks are in sync when committed.
Our pre-commit
hook does not cover at all the case when you want to update local files (here, .ipynb
files) that are not committed.
An alternative is to use a custom pre-commit script with a git reset HEAD **/*.ipynb
instruction. While this is not documented any more in the current version, it should still work. To use it you can refer to this previous version of the docs.
Out of curiosity, may I ask why you are willing to use a pre-commit (or a post checkout?) hook to update the .ipynb
files? Are you using another editor than Jupyter (which will take the source from the .py
file when Jupytext is installed?)
Hi @mwouts. I made this comment thinking about two possibilities for using jupytext
, one I found in a personal project and another at the company I intern at:
-
You want to version notebooks based on
.py
files. Quickly after I made the comment I realised I could just as well use thepy:percent
format for everything if I was okay with sacrificing the ability to save my output (which I was), so I only usedjupytext
once to convert all.ipynb
files to.py
files, no need for the hook here. -
I'm working for a company in which there is a need to version notebooks, so naturally
.ipynb
is less than ideal. The example I gave, using the--sync
flag, was meant to give the developer flexibility on whether to modify the.ipynb
or the.py
according to their preference. Still, even if you use--from ipynb --to py:percent
in the hook, you have to stage and then unstage the changes to the.ipynb
notebooks, which comes back to how that can be cumbersome. I want to be able to edit exclusively the.ipynb
files because that's how people at my organisation are used to doing things, and I'm just an intern who was asked to do this thing haha. Of course I could just talk to them to go to something like case 1. I mentioned above.
And on your question, my idea is to use VS Code, though it supports py:percent
just as well as Jupyter.
For now I'll use another solution I found on StackOverflow from the (allegedly, haven't verified myself) creator of pre-commit
. It's downside is evidently to running on every notebook every time, but if that causes problems I'll just create a custom hook like you suggested.
there is a need to version notebooks
I want to be able to edit exclusively the .ipynb files
This is a good use case for the existing hook imo. Workflow:
- Both the
.ipynb
s and the synced.py
s (in whichever format) are tracked in git - Edit your notebooks in whichever editor you want
- Stage them in git
- Try to commit or run the hook manually
- If the output produced doesn't match the
.py
files (because you have changed the code), those.py
s will be updated - Add the updated files to git and try again
- Commit succeeds
- Only look at the diffs of the
.py
s when examining changes
It seems like a lot but in practice it's just
- Do your work as usual
-
git add -u
-
git commit -m "Changes to notebooks"
(fails,.py
s updated) -
git add -u
-
git commit -m "Changes to notebooks and synced .pys"
(succeeds) -
git push
@JohnPaton in this situation though, adding both the py
and ipynb
could create merge conflicts on the ipynb
files, couldn't it? The hook fails if you just stage the py
files.
Only if there are also conflicts on the py
files (in the case of code changes), in which case they need to be resolved anyway, or in case people commit different outputs. If you're worried about that you could use something like kynan/nbstripout to additionally remove the outputs, and then there can only be conflicts in both files or neither, since the content will be essentially the same.