hnn-core icon indicating copy to clipboard operation
hnn-core copied to clipboard

[WIP] Introduce `ruff` code-format style

Open asoplata opened this issue 1 year ago • 8 comments

This is a PR for both introducing stylistic formatting using the incredibly fast ruff package (see https://docs.astral.sh/ruff/ ), and replacing our flake8 linting also with the linting provided by ruff. This will result in very many lines of code changes, but should not affect functionality at all.

Some of us were recently discussing if we should adopt a particular style, especially since the code right now is inconsistent. I think this is an opportunity to introduce ruff into the dev pipeline, both as our formatter of choice, and as our linter of choice (replacing flake8). Pros of this approach:

  1. ruff is unbelievably fast, both at formatting (both detecting and applying/editing format changes in-place) and linting. Our codebase is relatively small, but IMHO it is effectively instantaneous when I try it on both of these tasks, even on a 5-year-old Intel i5 processor.
  2. I personally really really love the default styling of ruff's formatting, which I think is identical to black's by default (e.g. line length 88 chars, etc.). Currently in this PR, the only non-default format style change is my preference for single quotes instead of double. Note that the longer line length is in line (pun intended) with other users of ruff like Scipy (see https://scipy.github.io/devdocs/dev/contributor/pep8.html#pep8-and-scipy ). I personally prefer the C-style braces and emphasis on verticality, but whatever style we decide needs to be a group decision.
  3. ruff's formatting is very configurable. We can easily make changes to our preferred format style.
  4. I can't tell if flake8 will actually edit your code to match particular formatting (which ruff does). As far as I can tell, flake8 can only inform you of certain formatting findings (but will not edit your code). I could be wrong here. If flake8 cannot edit the formatting of our code, then we will need to install or setup another package to do that anyways (so it might as well be ruff).
  5. In many cases, ruff's linting can be set to automatically fix individual issues. I don't know if flake8 supports this. I personally prefer to make the fixes myself so I understand them, but it's convenient.
  6. ruff's linting is also extremely fast and configurable.
  7. ruff doesn't currently have 100% support of pycodestyle rules (see https://github.com/astral-sh/ruff/issues/2402 ), but the vast majority are there, and it's definitely enough coverage to make it worth using.

Finally, I will note that after creating the huge-formatting-commit, I ran all the tests locally and they passed, and I also successfully rebuilt the docs locally just in case as well.

If we want, we can use this PR for discussing format changes we want to make, that way I can rebase them on this branch and refrain from polluting the master branch with changes until we're happy.

I will also use this PR to get ruff's linting up and running, both fixing issues that flake8 missed, and ensuring the CI runs of ruff's linting work.

asoplata avatar Nov 08 '24 21:11 asoplata

I like the plan! Can we discuss this after we're done with steering committee prep?

ntolley avatar Nov 13 '24 14:11 ntolley

I like the plan! Can we discuss this after we're done with steering committee prep?

Absolutely. We may even want to delay this until after v0.4, depending on if it introduces difficulties cleaning up the outstanding PRs.

asoplata avatar Nov 13 '24 15:11 asoplata

@asoplata lot of points to think about :) I'm fine with using ruff as a linter ... I would start with configuring ruff such that the number of stylistic changes are limited (<100 lines of code). Having huge diffs is not a recipe for success in open source repos ;-) [even if straightforward]. Smaller diffs are likely to be approved faster with less controversy. Look around in other repos how they have transitioned (e.g., mne-python). The more we copy what is the standard in the Python ecosystem, the better it is!

jasmainak avatar Nov 14 '24 03:11 jasmainak

Btw, flake8 is just the linter ... the style guideline is pep8. There is a tool called autopep8 which does the automatic fixing for you. I personally tend to prefer linting as opposed to automatic fixing since it makes me aware of the style guidelines so the code style is not an afterthought. The automatic tool does lower the barrier to entry for contributing. But it's hard to trust code that was written as a mess only to be fixed using an automatic tool. But I know people have their preferences ;-) I let you decide what's the best path forward as a community

jasmainak avatar Nov 14 '24 04:11 jasmainak

Thanks for your insight Mainak! Seeing how MNE does it will be an excellent reference point.

Regarding linting, as you're probably aware (and what my commits in this PR made more confusing), the linting of ruff check, without any arguments, does not automatically fix lint errors. When it comes to linting, I definitely agree with you that linted errors in general should be evaluated manually, to understand what is going on. I should remove the unnecessary --no-fix arguments I added, since they don't change anything and only serve to confuse. I don't think we should be including any automatic error fixing. Note that the only changes needed for ruff linting (as opposed to formatting) are in this commit, and are only a handful of lines https://github.com/jonescompneurolab/hnn-core/pull/934/commits/c3da09a25feb2892064fa47f5a619090c2242199 .

Regarding formatting, I definitely understand that Godzilla-sized, N x kloc PR diffs are not good practice, but I think that any instance of switching from just the pretty-loose pep8 constraints to significantly "tighter" and more opinionated styling is always going to result in many lines of code changes. MNE had to go through similar pains when Black (basically same as ruff) was introduced here https://github.com/mne-tools/mne-python/pull/11667/commits/e81ec528a42ac687f3d961ed5cf8e25f236925b0 . I personally find Black-style tighter formatting to be significantly more readable than what is current in hnn-core. More importantly, I think styling being consistent at all also enhances the readability, and that goes double when we share 99% of our styling with similar packages like MNE. Unlike the linter, the ruff formatter does make fixes in-place, but I think this is fine since it definitely should not introduce errors. Having an automated formatter also means you don't have to spend any thought on formatting (unlike linting errors).

I experimented with running autopep8 --diff --recursive --exclude=__init__.py --max-line-length=88 hnn_core/* before and after the huge ruff-format changes, and the only changes autopep8 identified after ruff formatting were 3 invalid finds (specifically, autopep8 wanted to introduce this pep8-invalid change https://github.com/hhatto/autopep8/issues/723 which is surely a bug).

I will definitely make a new issue about using pre-commit, and also any more recommendations we can share from MNE :)

asoplata avatar Nov 14 '24 19:11 asoplata

Can you explain the difference between formatting and linting? How can we explain to new contributors how to use ruff? e.g., is there a VSCode plug in etc? Should be added to contributing guidelines.

I would split this PR into two.

The first one just introduces ruff with roughly the same issues as flake8. Use ignore rules. In this PR you would make sure that everything is in place for people to migrate to ruff. The contributing guide should be updated and make sure that any other open PRs that may be affected are either closed or merged. The diff of this PR should be relatively small. This way you avoid rebase issues.

In the next PR, remove the ignore rules and make ruff more strict. This PR should be turned around fast so that people working on new PRs start from the new codebase.

jasmainak avatar Nov 15 '24 15:11 jasmainak

I will separate the linting part of this PR into its own, since 1. the linting is a very small change, 2. formatting is a much, much bigger change, and 3. I do not think we should merge the formatting change at this time, and instead wait until everything required for v0.4 is completed, and the websites are redesigned.

I will return to this PR and explain the difference at a later date.

asoplata avatar Nov 15 '24 19:11 asoplata

I think we should use this PR as the main place to discuss agreeing upon a code Formatting-style . I will split changing our linter from flake8 to ruff check using a new PR that I haven't created yet, and remove its commits from this PR.

To actually define what I mean by a code "Formatting-style", and to avoid the inevitable word count limitations, I've written a Gist instead explaining how we could consider defining and communicating about Linting vs Formatting vs "Programming Style Guides", including with examples: https://gist.github.com/asoplata/8bb94707810218df95dacaaaaf34bccb .

asoplata avatar Nov 27 '24 01:11 asoplata

@dylansdaniels This is the branch where we will apply and check the ruff format changes. Note that it has been heavily rebased so you'll need to force-pull it if you downloaded it previously. This is "ready to go", barring any stylistic changes we want to make, and some extra commits that will be deleted (see below). Currently, it has 4 commits:

  1. One very small commit (f796832) that adds what I think is some pretty clearly named Makefile recipes for using ruff format, and an initial configuration of ruff format's settings. If we're happy with these Makefile recipe names, then I can add another commit that documents our new required formatting usage in our Contributing page.
  2. A second commit (2cca559) that is the "mega" commit, that includes all the code changes from having run make format-overwrite (aka ruff format hnn_core). Obviously I ran the tests before and after and all tests pass.
  3. Two more commits that are replays/cherry-picks from #1052, and purely meant for the CI to successfully run the post-ruff format code against all our tests. These commits will eventually need to be removed from this branch once #1052 is merged, but for testing purposes here, they should be included.

Taking a brief look at the formatting changes, they're not as bad as I expected. The vast majority are changes to newlines, quote-type, or 0. -> 0.0. Again, the only difference we have from the defaults (available here https://docs.astral.sh/ruff/configuration/ ) are preferring single-quotes vs double-quotes, but that's just personal preference and I'm happy to use double-quotes if that's what people want.

I will add that once we're happy with style changes, I will definitely be inspecting every single change personally, just to be safe.

asoplata avatar May 28 '25 22:05 asoplata

@ntolley If you have strong stylistic preferences for our previously-discussed (a few months ago) auto-formatting of code, let me know however you want (email, this, etc.).

asoplata avatar May 28 '25 22:05 asoplata

Pending a detailed review of the many code changes by myself and Dylan, this mega-formatting change PR will become ready for merge.

asoplata avatar May 29 '25 19:05 asoplata

I have attempted to go through the entire series of code changes, and I can find no instances where the autoformatter changed the behavior of any code (and therefore, no instances of where it introduced bugs), including where careful string handling is needed. The only exception is the case that was already fixed in a5f3da55c0f8d44cd8a18d444fa402c0ce2cf5a2 where our setup.py script was explicitly looking for a single-quote character. The vast, vast majority of changes are (in order of frequency) changing single quotes to double quotes, followed by newline or indentation changes, followed by adding numerals after the decimal (e.g., all former 0. floats have become 0.0), followed by other changes.

Pending @dylansdaniels review, I think this is ready for merge, finally.

asoplata avatar May 30 '25 23:05 asoplata

Fully reviewed and changes look good

dylansdaniels avatar Jun 02 '25 22:06 dylansdaniels