ruff icon indicating copy to clipboard operation
ruff copied to clipboard

Support text Jupyter notebooks created with Jupytext

Open owenlamont opened this issue 7 months ago • 4 comments

I have a use case for Ruff and Ruff formatter that is a bit related to some of the other Markdown / Docstring feature requests but specifically I hoped to run Ruff and Ruff formatter on Jupyter notebooks that had been exported to markdown with Jupytext.

The company I'm at prefer converting notebooks to Markdown as it makes the notebook diffs much easier to read on Bitbucket (which doesn't support any notebook rendering/diffing like GitHub).

At first I noticed I could add markdown as a target file format for Ruff formatter and linter which got my hopes up that this would just work:

  - repo: https://github.com/charliermarsh/ruff-pre-commit
    rev: v0.1.6
    hooks:
      - id: ruff-format
        types_or: [python, pyi, jupyter, markdown]
      - id: ruff
        args: [--fix, --exit-non-zero-on-fix]
        types_or: [python, pyi, jupyter, markdown]

But when I ran Ruff I see it is failing to parse the markdown properly - I had hoped it would just run on the python comment code blocks in the same way it would parse Jupyter notebook cells and ignore all the other markdown content but its obviously trying to parse all the markdown, e.g.

image

owenlamont avatar Nov 21 '23 03:11 owenlamont

Hey, we currently don't have support for linting / formatting Python code in markdown blocks (https://github.com/astral-sh/ruff/issues/8237, https://github.com/astral-sh/ruff/issues/3792). I'll close this in favor of the markdown issue for bookkeeping purposes as I think that should solve this but correct me if I'm wrong here.

dhruvmanila avatar Nov 21 '23 14:11 dhruvmanila

Hmm, actually it would be a bit different as for markdown we wouldn't need to have the concatenated source code from all code blocks but if it's a notebook converted to markdown then I think it should have context from other code blocks? @owenlamont Do you think this is true?

Another solution currently that I can think of is to lint / format before converting it to markdown. I'm not sure how feasible this would be given my lack of knowledge about your setup.

dhruvmanila avatar Nov 21 '23 14:11 dhruvmanila

Hi @dhruvmanila - yeah it would have to have the concatenated source code - I can see Ruff still tracks which code was in which Jupyter cell when raising warnings so if it could treat comment blocks exactly as Jupyter cells are treated that would be ideal.

As a work-around it could be exported to ipynb, linted and formatted, then re-exported to markdown - but that would be onerous. When working with Jupytext the notebook never gets persisted (in any permanent/visible way) as an ipynb - it gets loaded from Markdown and saved back to Markdown.

The ideal solution (from my perspective) would be to parse the YAML front matter of the Markdown, identify this as a Juptext generated Markdown, then recognise the code blocks need to be concatenated and treated as notebook cells. I totally understand though if this use case is too niche to justify the effort though. I can't speak much as to how many people use this format - as a repo jupytext is relatively popular (around 6k users - I recognise some relatively prominent Jupyter developers as contributors).

owenlamont avatar Nov 21 '23 20:11 owenlamont

There's a similar request for quarto notebooks (#6140), and generally for Python code included in Markdown code blocks (#3792).

tvatter avatar Jan 26 '24 09:01 tvatter