CTSM icon indicating copy to clipboard operation
CTSM copied to clipboard

Black pre-commit as automatic formatter

Open negin513 opened this issue 2 years ago • 5 comments

Description of changes

Summary: This PR adds the capability to automatically format python codes using Black without changing git history using a pre-commit hook.


A while back, I have suggested using Black as a formatter for CTSM python codes to:

  1. Have a uniform python style/standard across different codes
  2. Accelerate our python developments and review process by automating the formatting.
  3. Improve Python code quality and adapt to modern SE practices.

We have discussed the best method of implementing this in #1471 and there were multiple suggestions. One of the concerns were messing with git history (https://github.com/ESCOMP/CTSM/pull/1577#issuecomment-993809689)

One way to automate the formatting of python codes without making any changes in the git history was using pre-commit hooks. This is similar to what modern python packages do for their developments (e.g. xarray ).

In this PR, I used this git feature and created a pre-commit hook to automate the formatting of any new python code that is being committed.

This is really easy to use and does not require any special changes. The only thing is that when the developer is committing a new change, it runs the formatter and gives some insight:

image

Please note that that pre-commit hook only modifies files that are changed and going to be committed.

However, using this we can make sure that our codes are always following a certain format.

Besides automatically running black on python codes, this pre-commit hook automatically checks and fixes:

1- trailing-whitespace : trims trailing whitespace. 2- check-yaml - checks yaml files for parseable syntax. 3- mixed-line-ending - replaces or checks mixed line ending. 4- end-of-file-fixer - ensures that a file is either empty, or ends with one newline.

Similar to xarray, I added a badge for black to readme.

Specific notes

Contributors other than yourself, if any: @ekluzek

CTSM Issues Fixed (include github issue #): #1471 and #1611

Are answers expected to change (and if so in what way)? No

Any User Interface Changes (namelist or namelist defaults changes)? No

Testing performed, if any: tried submitting changes with this pre-commit.

negin513 avatar May 26 '22 00:05 negin513

This does not change the interface for a user.

For our developers who want to run this pre-commit , please try this using the following:

pip install pre-commit (you only do this once.)

git clone https://github.com/ESCOMP/CTSM
cd CTSM
pre-commit install

So just one additional command after cloning the repository will activate this automatic checking and formatting.

negin513 avatar May 26 '22 00:05 negin513

Hmmm, this is interesting. The pip install didn't seem to work for me, the pip install seemed to work OK, but then I couldn't do pre-commit commands. But, I was able to get it to work by adding it to my conda environment. And I think that's what we want in the end anyway. So I think we might want to have a conda environment for pre-commit, or include it in the standard CTSM conda environment.

After that pre-commit failed in my ctsm conda environment, but I got it to work by creating a conda environment just for pre-commit. But, then I found that running pre-commit in the top directory wanted to change over 700 files (because of the trailing blank fixer and other things). I think the pre-commit normally would only be applied to files that were going to be committed, so normally it wouldn't do that. But, we probably want to run the pre-commit fixer on everything (which would be a good thing to do anyway), as a one-shot formatting step. I do like having a fixer for these type of things, but we should probably fix our code by the fixer standards first.

So let's talk more about this tomorrow, and let's meet more to discuss this. Thanks for working on this.

ekluzek avatar Jun 02 '22 04:06 ekluzek

We talked about this at CTSM software this morning. To start with we think we should restrict the pre-commit hook to the python directory, until we get used to this workflow. I'm more nervous about FORTRAN code getting changed by formatters. The changes proposed here are good and ones I agree with, and shouldn't cause problems, but in principle for FORTRAN code I like the idea that the testing we've done when we commit is done with the code as committed. This would mean in practice if the pre-commit hook is triggered you should then run testing on the code before you do the second commit.

Another thing we thought we should do is to create a separate conda environment for the pre-commit hook. So part of this commit should be to add another conda environment for pre-commit. I checked and none of the default conda environments include pre-commit (which is as I would expect).

ekluzek avatar Jun 02 '22 19:06 ekluzek

Some hooks that I think we should look into:

check-added-large-files check-executables-have-shebangs check-shebang-scripts-are-executable check-merge-conflict check-xml

There's also a bunch of specific format and language linters and checkers. A perl checker could be interesting, but we are also trying to get away from perl and move to python.

ekluzek avatar Jun 03 '22 02:06 ekluzek

@negin513 I'm assuming since you are in CISL now, that we should plan on bringing this one in ourselves. Let us know if you have any thoughts on that. I think this is close to being done and we should be able to take it on and finish it out.

ekluzek avatar Oct 17 '22 03:10 ekluzek