xarray Use ruff for formatting

[ ] Closes #8760 ~~- [ ] Tests added~~ ~~- [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst~~ ~~- [ ] New functions/methods are listed in api.rst~~

Note: many inline ... obtain their own line. Running black . would have produced the same result

Feb 17 '24 16:02 etienneschalk

I can't reproduce black . moving the ... in type stubs / override to a separate line, so this is an example where ruff diverges from the black style (to be fair, I would have done the same if the declaration excluding the override decorator is multiline already). Could it be that black doesn't have an opinion in this case?

The trailing empty line for blocks (empty ...) were something we intentionally chose to introduce in #4510 (this was a styling option that the ruff people intentionally chose to do differently).

Another thing which I'm not too happy about is their assumption of :: in rst meaning python, then falling back to doing nothing if the code does not parse correctly (which they do for all rst code blocks). This means that invalid code is not caught by the formatter but only by running the doctests, which is usually done at a much lower frequency than running the formatter.

Additionally, xarray is a small-ish repository, and running black on the entire code base (which is a rare thing to do outside of CI) completes in less than a second (using the compiled black on PyPI). In other words, I'm not sure if speed should be the deciding factor.

So as a summary, I'm not a fan of (most of) these changes, and I would vote for waiting until the dust settled a bit more (I feel a bit conflicted as the author and maintainer of one of the replaced hooks, though).

Feb 17 '24 18:02 keewis

Additionally, xarray is a small-ish repository, and running black on the entire code base (which is a rare thing to do outside of CI) completes in less than a second (using the compiled black on PyPI). In other words, I'm not sure if speed should be the deciding factor.

I find that the big files such as dataset.py & test_dataset.py take a long time to save, with autoformatting in vscode. Xarray is the only python repo I have this problem with. It makes working with a quick iteration loop of tests running when saving a file implausible.

Running echo "" >> xarray/tests/test_dataset.py && time black xarray/tests/test_dataset.py takes 5000ms! Running echo "" >> xarray/tests/test_dataset.py && time ruff format xarray/tests/test_dataset.py takes 50ms

So I would prioritize performance quite highly here

I would vote for waiting until the dust settled a bit more

Not strongly opposed to waiting for a bit if others want to, I do expect some churn in the ruff formatter

Note: many inline ... obtain their own line. Running black . would have produced the same result

I think this is an issue with black versons. Worth aligning with the pre-commit config

Feb 18 '24 04:02 max-sixty

Hello,

Thanks for your answers!

I made this Draft PR as a small example of migrating from black to ruff-format, to have some base material to talk about the parent issue #8760.

I can't reproduce black . moving the ... in type stubs / override to a separate lin

and

I think this is an issue with black versons. Worth aligning with the pre-commit config

Regarding black .: indeed I did not pay attention at the version I used. Here is the version on my project env:

black --version
black, 21.12b0 (compiled: no)

I also see I don't have a compiled version of black installed. I am not sure how to get this version though.

Also I agree that I am not a fan of this forced de-inlining, I had a look inside of the ruff-format documentation but could not find any configuration option allowing to opt-out of this de-inlining. It also brings unnecessary formatting changes to the codebase. Maybe an issue can be raised on the ruff repo itself about this topic, it being a necessary condition before even considering moving to using ruff-format. Edit: https://github.com/astral-sh/ruff/issues/10026

The trailing empty line for blocks (empty ...) were something we intentionally chose to introduce in #4510 (this was a styling option that the ruff people intentionally chose to do differently).

Regarding the trailing ... in doctests: I know they can cause issues when using pytest-accept. See https://github.com/max-sixty/pytest-accept/issues/148 for the discussion on that matter with @max-sixty ; it can even be traced back to a bug in the standard doctest lib itself. So I would argue in favor of removing them, something that ruff does. Also I am not sure where in #4510 these trailing ... are mentioned, maybe it is the wrong link?

Another thing which I'm not too happy about is their assumption of :: in rst meaning python

Is this something that blackdoc solves? Would it be possible to extract this logic with another formatter, or is it tightly coupled to black? Or in other words: do you think blackdoc could become a ruffdoc at some point? As the author of blackdoc you would have valuable insight to help improve other formatters for sure.

If I understand well, it formats code contained in documentation. This issue mentions formatting code in documentation + blackdoc: https://github.com/astral-sh/ruff/issues/7146

Additionally, xarray is a small-ish repository, and running black

I agree with @max-sixty here, I also use VSCode with auto-formatting enabled and the project is laggy from time to time. Two levers of action I can think of: reducing formatting time with a more performant formatter, or reducing the size of the file sizes (this would be a refactoring effort, and I don't know what the xarray team considers to be a "too big" file that would deserve to be split). Quality of life during development is valuable

Feb 18 '24 13:02 etienneschalk

Maybe an issue can be raised on the ruff repo itself about this topic

Based on the discussion at the end of astral-sh/ruff#5822 I am led to believe that we just need to wait a bit more?

maybe it is the wrong link?

that was https://github.com/pydata/xarray/pull/4510#discussion_r512138255, sorry for not directly linking that

Would it be possible to extract this logic with another formatter, or is it tightly coupled to black?

ruff is entirely written in rust (which is what makes it fast), while blackdoc is written in python (and built on top of black). The two are thus pretty much incompatible as far as code goes, but I remember the ruff people looking at both blackdoc and blacken-docs for inspiration when creating the docstring formatter.

In any case, note that formatters should not change the behavior of other tools, if it does then there's something wrong with either the formatter or the other tool. The trouble here, though, is that doctest is a pretty ambiguous format: what would

>>> print("...")
...

be interpreted as? ruff and blackdoc will probably both remove the trailing ..., and would be wrong to do so.

However, this:

>>> if condition():
...     do_something()
...

is not as ambiguous: since the code example starts with a block statement, the ... can be expected. Obviously that would require an analysis of the first line(s), but this is much better than having to analyze the semantics (and it usually is not a good idea to artificially restrict a style because of the tooling).

I'm not sure how resolve this, besides switching to a different format where you're much more unlikely to encounter the ambiguity (like the IPython format).

I also use VSCode with auto-formatting enabled

I can empathize with that. I personally do the auto-formatting on commit (using pre-commit), not on save, so I'm not as affected by that (waiting a bit longer on commit is not that much of a deal).

Feb 18 '24 14:02 keewis

we just need to wait a bit more

Yes indeed, if I understand well black has this concept of "Preview" style: https://black.readthedocs.io/en/stable/the_black_code_style/future_style.html

This is possible to test this "preview" style (see https://github.com/astral-sh/ruff/issues/10026#issuecomment-1951367648 ; I updated the PR accordingly)

When this preview style is used, Ellipses are back onto their single line, but other style changes also come in.

The print("...") example is perfect!

I'm not sure how resolve this, besides switching to a different format where you're much more unlikely to encounter the ambiguity (like the IPython format).

What is the IPython format, an alternative syntax for doctests? As usual, delimitation is the issue: how to delimitate input from expected output to be unambiguous without becoming too verbose...

Feb 18 '24 16:02 etienneschalk

but other style changes also come in.

You'd also get these other changes with black, but I think they pushed those back a year because it wasn't stable enough in time for the new stable style. So I think ruff just needs to catch up to that change.

What is the IPython format, an alternative syntax for doctests?

just like doctest it is the output of a REPL, in this case ipython instead of python, so we'd (basically) be copying the output of a interactive shell. However, the delimiters chosen for continuation lines ( ...:, with the number of spaces chosen such that the prompts align) are much less likely to be encountered outside of REPL tooling.

For example:

In [9]: """
   ...: """
Out[9]:

In [10]: (
    ...:     1 + 2
    ...: )
Out[10]: 3

Feb 18 '24 17:02 keewis

So ipython is to python what an idoctest would be to doctest! I can see that something already exists: https://github.com/ipython/ipython/blob/main/IPython/testing/plugin/ipdoctest.py

What I can see potentially annoying with IPython format is the numbering of the input-output pairs, if you want to add an example at the top of the docstring, all the following existing doctests would require to change. Also, when going from 9 to 10, digit count goes grom 1 to 2 and adds an offset.

Here I just empty the execution count, replacing it by a space.

Then the docstrings would look like for example:

In [ ]: """
   ...: """
Out[ ]: '\n'

In [ ]: (
   ...:     1 + 2
   ...: )
Out[ ]: 3

In [ ]: print("...")
...

In [ ]: print("Out[ ]:")
Out[ ]:

In [ ]: "Out[ ]:"
Out[ ]: 'Out[ ]:'

In [ ]: None

In [ ]:

Feb 18 '24 17:02 etienneschalk

I notice that the diff is now quite large? I guess that's the preview flag? And that mypy is failing?

If we can reduce the size of the diff as much as possible and get the docs passing, I'd be +1 here. But obv one voice among many. I'm also fine to wait until things settle more if others prefer.

Feb 26 '24 05:02 max-sixty

Yes, the preview flag brings too things:

The single line stubs
Javascript-style braces

The preview flag cannot be reliable upon, so for now, the only option is to wait for the official integration of single line stubs in the formatting rules

Even with preview = false the diff is still large (36 files). However most of it is constituted of ellipses. So when it's fixed, the diff should be significantly smaller

There is still also the issue of ellipsis new lines in the dosctrings, but it is has a less significant share in the diff.

I'll try to keep this draft PR updated as a "showcase" of what applying ruff alongside a pre-commit run --all would look like

Feb 26 '24 21:02 etienneschalk

I'll try to keep this draft PR updated as a "showcase" of what applying ruff alongside a pre-commit run --all would look like

OK perfect!

Feb 26 '24 22:02 max-sixty

xarray xarray copied to clipboard

Use ruff for formatting

xarray
xarray copied to clipboard