jupytext
jupytext copied to clipboard
Jupytext in a pre-commit disagreement with trailing-whitespace and prettier
Hi,
I am trying to run jupytext as a first step in pre-commit, before running among others trailing-whitespace and prettier.
My goal is for jupytext to generate .md files based on .ipynb files, since these are easier to version control than .ipynb.
The idea is that pre-commit will pass if no new .md file is added, and if no existing .md file is changed by jupytext.
The way I implemented this seems to cause a problem with two other hooks, trailing-whitespace and prettier. The way jupytext formats the .md files cause trailing-whitespace and prettier to fail and adjusting the code.
Is there any way to adjust jupytext or the way I use it, so that these other hooks do not fail?
This is my file .pre-commit-config.yaml which lives in the root directory:
# NOTE: The versions can be updated by calling
# pre-commit autoupdate
repos:
- repo: local
hooks:
- id: jupytext
name: jupytext
entry: jupytext --to .md notebooks/*.ipynb
files: .ipynb
language: python
- repo: https://github.com/psf/black
rev: 19.10b0
hooks:
- id: black
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v3.1.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- repo: https://gitlab.com/pycqa/flake8
rev: 3.8.3
hooks:
- id: flake8
additional_dependencies:
- flake8-bugbear
- flake8-docstrings
- repo: https://github.com/prettier/prettier
rev: 2.0.5
hooks:
- id: prettier
args: [--prose-wrap=always, --print-width=88]
- repo: local
hooks:
- id: pylint
name: pylint
entry: pylint
language: system
types: [python]
additional_dependencies: ["-r requirements.txt"]
If I update an .ipynb file and run pre-commit, this happens:

Immediately re-running pre-commit causes all other notebooks to be updated by jupytext,.
Seems like the problem was that .ipynb files were not checked or modified by trailing-whitespace and prettier, which created a unhelpful loop that happens every time pre-commit is being run:
- Jupytext imports whitespace-errors from dirty
.ipynbfile into.mdfile, and pre-commit sees this as jupytext failing. - trailing-whitespace fixes the whitespace-error , and pre-commit sees this as trailing-whitespace failing.
So the solution, I guess, is either to disband the goal mentioned above, or to run all hooks also on .ipynb files. Not sure which I should go for.
Hi @tordbb , thanks for taking the time to report this.
I must say that I am not sure yet what is the best way to use pre-commit hooks on notebooks (see also #432 , #446). This is a topic on which I'd like to make progress, but, just like you did here, many times I encountered difficulties in applying pre-commit hooks on pairs of files.
Maybe you could consider doing the following:
a) Apply the pre-commit hook to .md files only, and let Jupytext's plugin for Jupyter do the sync to the .ipynb file, using paired notebooks
b) Or, add an additional step at the end of your pre-commit hook: jupytext --to ipynb --update *.md to update the .ipynb file - but again, I am not sure how well the pre-commit hook works with files that you don't include in the commit - or maybe you can git add and the reset the .ipynb file in the hook itself like we do here?
c) Or, apply the pre-commit hooks to .ipynb files only, using jupytext --pipe-fmt markdown --pipe [cmd]
Thanks again for your help! I found a viable solution by following your suggestion of adding another jupytext hook at the end.
The problem turned out to be how prettier did two changes to .md notebooks that were not being imported to .ipynb notebooks by jupytext:
-
Changing single ticks into double ticks:

-
Adding spaces between cell headlines and the cells:

When I made prettier exclude all files in the notebooks directory from inspection, everything worked out great. Ideally I wanted to include ipynb files, but did not find a regex that only excluded .md files inside notebooks/. Feel free to share if you can think of one.
I am not sure if jupytext should adapt the formatting that prettier idolizes, but that may be up to you.
See my final .pre-commit-config.yaml below:
# NOTE: The versions can be updated by calling
# pre-commit autoupdate
repos:
- repo: local
hooks:
- id: jupytext
name: jupytext --to md
entry: jupytext --to .md notebooks/*.ipynb
files: .ipynb
language: python
- repo: https://github.com/psf/black
rev: 19.10b0
hooks:
- id: black
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v3.1.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- repo: https://gitlab.com/pycqa/flake8
rev: 3.8.3
hooks:
- id: flake8
additional_dependencies:
- flake8-bugbear
- flake8-docstrings
- repo: https://github.com/prettier/prettier
rev: 2.0.5
hooks:
- id: prettier
args: [--prose-wrap=always, --print-width=88]
# Avoid prettier from harmfully changing .md notebooks
exclude: notebooks/
- repo: local
hooks:
- id: pylint
name: pylint
entry: pylint
language: system
types: [python]
additional_dependencies: ["-r requirements.txt"]
- repo: local
hooks:
- id: jupytext
name: jupytext --to ipynb
entry: jupytext --to ipynb --update notebooks/*.md
files: .md
language: python
Thank you @tordbb for sharing this! At some point I'd like to write a quick post or documentation page on how to best use pre-commit hooks, may I reuse your config above?
a regex that only excluded .md files inside notebooks/
Nice challenge... let me think about it...
I am not sure if jupytext should adapt the formatting that prettier idolizes
How that's interesting... Jupytext uses yaml to encode the YAML header, so I am a bit surprised that the quotes don't match those of prettier.. do you have any idea how prettier parses the YAML, and which one is the most standard?
Regarding the extra blank line, Jupytext should include a blank line between any two cells (and thus match prettier, right?)
When you don't get a line in between, this is because the cell has a cell metadata lines_to_next_cell=0, which is itself coming from the text file, if at some point the markdown cell and the code cell were not separated by a blank line.
Hi!
Off course, feel free to use it all. I am so thankful for your work on jupytext, and would be honored if my small contribution is somehow helpful in your work - published or not.
Regarding your question about YAML, I am afraid I do not know.
Thanks again, and have a great day!
On Fri, Jul 31, 2020, 18:29 Marc Wouts [email protected] wrote:
Thank you @tordbb https://github.com/tordbb for sharing this! At some point I'd like to write a quick post or documentation page on how to best use pre-commit hooks, may I reuse your config above?
a regex that only excluded .md files inside notebooks/
Nice challenge... let me think about it...
I am not sure if jupytext should adapt the formatting that prettier idolizes
How that's interesting... Jupytext uses yaml to encode the YAML header, so I am a bit surprised that the quotes don't match those of prettier.. do you have any idea how prettier parses the YAML, and which one is the most standard?
Regarding the extra blank line, Jupytext should include a blank line between any two cells (and thus match prettier, right?) When you don't get a line in between, this is because the cell has a cell metadata lines_to_next_cell=0, which is itself coming from the text file, if at some point the markdown cell and the code cell were not separated by a blank line.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mwouts/jupytext/issues/580#issuecomment-667213881, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACXBS4F5TNXFP72U7IO36IDR6LWOFANCNFSM4PN45FUA .
@tordbb @mwouts I quite like my solution to using JuPyText as a pre-commit hook which can be found below:
- repo: local
hooks:
- id: jupytext-sync
name: Sync scripts and notebooks
files: '^examples/tutorials/(colabs|nb_python)/(.*\.py|.*\.ipynb)$'
entry: jupytext --update-metadata '{"jupytext":{"notebook_metadata_filter":"all", "cell_metadata_filter":"-all"}, "accelerator":"GPU"}' --set-formats 'nb_python//py:percent,colabs//ipynb' --pipe black --pipe "sed s/[[:space:]]*\#[[:space:]]\%\%/\#\%\%/g" --pipe 'isort -' --pipe-fmt 'py:percent' --sync
pass_filenames: true
additional_dependencies:
- 'jupytext==1.5.2'
- black
- isort
always_run: false
language: python
This has some benefits:
- It will automatically install jupytext
- It will run if the ipynb or the python file is modified. Meaning if the linters update the python file then those changes will be synced back to the notebook. I usually put JuPyText at or near the end for this reason
- I can specify exactly which I want updated
- Occasionally you can get into update loop where one will try to update the other since both sync directions are included, but this can be easily fixed by deleting one file.
- The pipe commands ensure the linters are also run on the notebook, but they aren't strictly necessary. They are mostly to help get out of the loop mentioned before.
TLDR: Use --set-formats and --sync will solve a lot of your issues @tordbb since which file is newer will overwrite the changes in the other.
Thank you @tordbb and @Skylion007 . This is getting interesting! When time permits I will experiment a bit more with this myself, and write a post with all due credit to your examples !
@Skylion007 , do we agree that --pipe "sed s/[[:space:]]*\#[[:space:]]\%\%/\#\%\%/g" was motivated by #562? With Jupytext from master (to be shipped in 1.6.0) you won't need this any more (and spaces are now allowed in pipe commands :smiley: ).
Also I see you use --pipe 'isort -', don't you experience issues like #553 ? On my side I now use --pipe 'isort - --treat-comment-as-code "# %%" --float-to-top' (requires isort in dev version, as these two options were introduced very recently by @timothycrosley, the author of isort).
@mwouts Yeah, but my pre-commit hook downloads from pip so until it's published I can't remove that workaround. 👍 I do experience issues like #553, but it hasn't broken anything, yet. Also waiting for that fix to be merged.
@mwouts More than that, I would like to code up a simple pre-commit yaml for the repo. One that doesn't need to use the current the --pre-commit arg and uses the nice features baked into pre-commit.com
I would like to code up a simple pre-commit yaml for the repo
Oh, that would be awesome!
I am also having an issue with jupytext and prettier generating conflicting formatting.
In my case, the issue is caused by our .editorconfig setting the default indentation to four spaces also for yaml files.
That's to say that unless jupytext respect all sort of config files when generating the markdown, there's no way to achieve perfect consistency with prettier or other formatting tools.

I think the best would be to pipe the output through prettier as well, as suggested for black:
-args: ['--pipe', 'black']
+args: ['--pipe', 'black', '--pipe', 'prettier -w {}']
However, it is the markdown file that would need to be piped through prettier, and I don't think one can have multiple --pipe-fmt in the same command:
jupytext --sync --pipe-fmt py:percent --pipe black --pipe-fmt md --pipe 'prettier -w {}'
Hi @grst , I do like the suggestion of allowing multiple --pipe-fmt arguments. We could implement that (I'd require that --pipe-fmt would be either of length 1, or the same length as --pipe), but I was trying to see if I could write a test, and I see that we already have an example of a pre-commit hook where Jupytext is called twice, with two different --pipe-fmt:
https://github.com/mwouts/jupytext/blob/12964ef42783722815cd5d12df47d5ce67c1b323/tests/test_pre_commit_5_reformat_markdown.py#L38-L46
Do you think the same approach could be used to call prettier (say, in addition to black) when doing jupytext --sync?
I haven't tried it, but as long as it's two separate steps in pre-commit, I'd expect it to suffer from the same problem: If the first step generates a different formatting than the second one they cannot pass at the same time (pre-commit fails if a file got modified).
That being said, in the above example it could work, i.e. the first jupytext command only modifies an ipynb file (e.g. formatting it with black), the markdown is only updated in the second step.