[CI] Add GitHub CI job running clang-format
Summary
It would be useful if the GitHub CI would check that a PR is formatted before merging it.
Problem statement
Automatically checking the code is formatted is safer than having a check box on a PR.
I have also seen some differences in the formatted code which may be due to different clang-format versions. Integrating this check in GitHub would clarify what the source of truth is.
Preferred solution
The preferred solution is for GitHub to have a job that runs clang-format. It should be possible to set it up so that the action automatically pushes the commit fixing the formatting. If this is not possible showing the diff would be useful.
I would suggest to use whichever clang-format version is shipped with the oneAPI base toolkit in oneapi/compiler/latest/bin/compiler/clang-format, assuming there is no plan to remove that.
Pinging @oneapi-src/onemkl-arch-write and @rscohn2 who may have opinions on that.
I plan to work on this. If we are OK with the old clang-format-9, which is used in the internal CI, we can continue using that version. Or we can switch to a newer version together in our CI. This direction needs more work. For clang-format in the base toolkit, my concern is that it might be extra work for contributors to download the package if they want to run the format check and it could be annoying if it gives different results from release to release.
That's good to know, thanks!
On the contrary, I think my suggestion to use the clang-format binary from the oneAPI package should avoid developers to download more packages. As of today DPC++ is by far the most supported SYCL implementation in oneMKL Interface so most people will have it available already.
Besides clang-format-9 is very old, the latest version available is 19. This means that not only very few developers have this package lying around already but also we can run into weirdly formatted code.
I don't expect there will be much changes from release to release if any to be honest.
I use pre-commit to manage the code checks. It will automatically download clang-format and you have full control over versions. It will optionally manage a git commit hook to run the checkers, or you can run it manually. It supports excluding directories, etc. It integrates 100's of checkers.
pre-commit is a python package so developers would create a python virtualenv and install pre-commit into it. You do not typically have to install anything else.
Here is a sample: https://github.com/oneapi-src/distributed-ranges/blob/main/.pre-commit-config.yaml The local id at the bottom are project-specific checks that are integrated into the framework.
The minimal bootstrap is:
python3 -m venv .venv
source .venv/bin/activate
pip install pre-commit
I rely on the git commit hook so my typical workflow is:
git add --all
git commit -m "commit message"
# pre-commit typically will fix all issues that it finds, git diff if you want to see what it did
git add --all
git commit -m "commit message"
It you don't like the hook then run manually:
pre-commit run --all
pre-commit is nice but I don't think it answers all the issues here. Even if it is documented, not everyone will use it and we're not sure people will use the same clang-format version.
Basically clang-format-9 is out. We have power shutdown for several days. I will start evaluating newer clang-format and the one in oneAPI next week.
pre-commitis nice but I don't think it answers all the issues here. Even if it is documented, not everyone will use it and we're not sure people will use the same clang-format version.
There is a decision to make about whether the github action re-formats and commits the changes or merely checks and rejects misformatted commits. pre-commit can work either way, and it can show the diffs for a rejected commit. I prefer check and reject for the following reasons:
- reformatting can break the build (e.g. sorting of includes)
- if github action modifies your code, the PR branch is out of sync with your local copy, requiring pull if you want to keep editing. It can be confusing for some developers.
- I occasionally use directives to turn off automatic forwarding for sections of code. I prefer having an opportunity to review the formatting before committing.
- workflow needs extra permissions
pre-commit makes it much easier for a developer to do local formatting and exactly match the CI system behavior. You don't have to install clang-format. The clang-format version and how to run it is in a config file. It knows how to only check the files you are modifying in the commit. It makes it easy to add the checking as a commit hook.
After you add pre-commit, it is easy to add checkers for all the files, not just the C++ files.
I have no strong preference whether the GitHub action applies the changes or merely checks and reject. If we go with the latter we should ideally make the diff available to download or copy so the developer can easily apply the changes IMO.
I'm trying to understand how pre-commit relates to the suggestion of using a GitHub action. To me they are quite different because to my knowledge there is no way to force developers to use pre-commit. pre-commit can be used alongside the GitHub action of course.
Now I'm thinking maybe you will want the GitHub action to re-use the same scripts and clang-format version than pre-commit is using? I don't know if that's possible. If so then this is a good argument for using whichever recent version is available at https://github.com/pre-commit/mirrors-clang-format I suppose.
This is the way I usually integrate pre-commit into a github workflow: https://github.com/oneapi-src/distributed-ranges/blob/02bbffb64546748f1c974ee9c67efeac3bbb18e8/.github/workflows/pr.yml#L34
It will reject commits that are not formatted (or fail the other checkers). When it fails because of formatting, it displays the diffs in the log. If you want automatic fixup, you would add an extra step to commit the results.
There is also a pre-commit action in the marketplace that works well. I think it does a better job of caching installs between runs. I don't use it because intel does not allow 3rd party github actions for internal repos.
A benefit of using pre-commit is that it makes it easier to run locally, easier to integrate into a CI workflow, ensures consistency between local development and CI environments. It is worthwhile to use if you only want to run clang-format, but it also makes it easier to run checkers/formatters for python, yaml, rst, sh, .... When you have multiple repos, you only need to copy .pre-commit.yaml and the github workflow and you get all the same checking/formatting.
Ok I better understand what you mean now. This is a possible solution then. With this solution we will still have to agree on a version for clang-format. I am hoping it will be something more recent than 9! I needed to make sure that your suggestion was not only to ask developers to setup the pre-commit scripts from Git (even if it helped with some pip packages).
I am hoping it will be something more recent than 9!
https://github.com/pre-commit/mirrors-clang-format installs clang-format from https://pypi.org/project/clang-format/ The version numbers for both indicate the version of clang-format. 19.1.1 is the latest available.
Thanks @rscohn2 for suggesting pre-commit! It's sooo easy to experiment with different clang-format versions! I think our Infra team will also be happy about it. :smile:
After we add a .pre-commit-config.yaml with the specified clang-format version in the repository, as Robert said, to run clang-format check all you need to do is
python3 -m venv .venv
source .venv/bin/activate
pip install pre-commit
cd <onemkl_interfaces_repo>
pre-commit run --all-files
We can add the --show-diff-on-failure option to show diff.
I ran different clang-format versions back-to-back to see if we want to stay with one version for some time, say 1 year or more, or always use the latest version. The results are shown in the following table. I first applied v9.0.0 to the current HEAD commit, then I commit the format changes. The new commit is shown as "v9.0.0 commit" in the table. I repeated this process with different versions, whose release dates are included in the parentheses.
| Commit | Applied version | Changes? |
|---|---|---|
| a429289 (current HEAD) | v9.0.0 | Yes |
| v9.0.0 commit | v10.0.1 | No |
| v9.0.0 commit | v11.0.1 | Yes |
| v11.0.1 commit | v12.0.1 | No |
| v11.0.1 commit | v13.0.0 | No |
| v11.0.1 commit | v14.0.0 (Apr 12, 2022) | No |
| v11.0.1 commit | v15.0.4 (Nov 15, 2022) | Yes |
| v15.0.4 commit | v16.0.0 (Mar 22, 2023) | No |
| v15.0.4 commit | v17.0.1 (Oct 10, 2023) | Yes |
| v17.0.1 commit | v18.1.0 (Mar 7, 2024) | Yes |
| v18.1.0 commit | v19.1.0 (Sep 24, 2024) | Yes |
From the results, I would suggest that we stay with one version for some time.
I agree that clang-format should only be suggestion not enforcement. In other words, we should be able to add // clang-format off/on when the suggestion doesn't make sense.
That's interesting! I'd say using 19.1.0 for a while is fine then.