[ci] Use pixi and pre-commit for all linting jobs
Motivation
Running linters in this repository is always a little involved (one needs to execute a number of commands to actually run everything), leading to many (unexpected) CI failures. This has already improved by using pre-commit which bundles a bunch of linters.
This PR proposes to bundle all linters in pre-commit and use pixi to track linting dependencies (more on this below).
Changes
- Use
pixito manage all linting dependencies - Use local hooks for pre-commit, using the dependencies installed via
pixi - Remove the
linttask from the.ci/test.shfile - Adjust the CI job executing the linters
- Adjust code as per the versions of the linters installed now (changes performed by
ruffandtypos)
What is pixi and why?
Pixi is a modern and fast alternative to conda, mamba, etc. Hence, at a high level, it is a cross-platform package manager which installs conda packages (from conda-forge). Some differences that make it so nice to use:
- It has a "project-based" approach. Instead of global environments with a name, environments are project-bound. All dependencies are installed into the
.pixidirectory in the repository root. One can have arbitrarily many environments. Thedefaultenvironment typically captures the local development environment. - Pixi maintains a lockfile (
pixi.lock) which has two benefits: on the one hand, CI doesn't suddenly fail if a new version of a tool is available. On the other hand, the computationally intensive "solve" step does not need to run in the CI. The solve is only performed once locally (and is very fast compared toconda-- it runs in seconds even in projects that are considerably more complex than LightGBM). - Pixi allows to add tasks which can be executed via
pixi run. These tasks run in the deno task shell and, thus, can be executed in a platform-agnostic manner. When running a task, all required dependencies are automatically installed, making it trivial for the user to get started. - Not directly related to
pixi, but:setup-pixiautomatically caches environments in the CI, further improving download times.
What's up with the pre-commit setup?
pre-commit is the most ergonomic way to bundle all linting jobs together. One can now simply run
pre-commit run --all-files
to execute all linters in this repository. Due to the custom pixi task, you can also run
pixi run lint
This automatically installs all required dependencies, including linters and pre-commit itself (except for pwsh which is not currently available via conda-forge for Linux & OSX).
The pre-commit setup itself does NOT use other repositories anymore but, instead, uses pixi to execute the linting jobs using the linters installed via the pixi. This has the benefit that (1) the behavior of pre-commit matches the behavior of running linters manually (e.g. when auto-formatting in the editor) and (2) there is only one update process for all dependencies.
One additional benefit of using pre-commit is also that linters are executed "lazily", i.e. if one linter fails, the other linters are being run as well. Before, we eagerly exited on the first linter failure.
Why remove the "lint" job from .ci/test.sh?
The linting job has a trivial entrypoint now and keeping the task in this lengthy script is rather confusing than helpful IMO.
Where to go from here?
If we can agree that the use of pixi makes sense, I'll propose to use it to manage the Python dependencies in a separate PR. This will allow us to cut down CI runtimes significantly and, much more importantly, allow contributors to get started with all necessary dependencies much quicker. No worries: I will not propose to manage compilers in CI jobs via conda ;)
I personally found pixi about a month ago from this comment. Looks really cool! Have to try it...
there are other parts of this PR I really like and support immediately
It's been a few months so I just did this. Put up #6986 with some of the unrelated changes from this PR. Hopefully that will help shrink the diff here for if / when we decide to return to it.