flake8
flake8 copied to clipboard
Performance suggestion: do not run unselected plugins/checks
In GitLab by @hugovk on Jun 5, 2020, 01:45
Please read this brief portion of documentation before going any further: http://flake8.pycqa.org/en/latest/internal/contributing.html#filing-a-bug
Please describe how you installed Flake8
$ pip install -U flake8
$ brew install flake8
# etc.
Please provide the exact, unmodified output of flake8 --bug-report
{
"dependencies": [],
"platform": {
"python_implementation": "CPython",
"python_version": "3.8.3",
"system": "Darwin"
},
"plugins": [
{
"is_local": false,
"plugin": "flake8_2020",
"version": "1.6.0"
},
{
"is_local": false,
"plugin": "mccabe",
"version": "0.6.1"
},
{
"is_local": false,
"plugin": "pycodestyle",
"version": "2.6.0"
},
{
"is_local": false,
"plugin": "pyflakes",
"version": "2.2.0"
}
],
"version": "3.8.2"
}
Please describe the problem or feature
I noticed that Flake8 takes the same time to run with --select
as without. As shown using -vv
verbosity, it runs all the plugins and checks regardless of --select
, and only reports the selected ones afterwards.
Flake8 can sometimes take a long time to run on large codebases, and if it was possible to only run the selected checks, that would save a lot of time, CPU and power.
Would it be possible to only run selected checks/plugins? Rather than running them anyway and discarding that work when reporting?
Docs
For reference, my emphasis.
flake8 --help
says --select
is for which ones to enable:
--select errors Comma-separated list of errors and warnings to enable. For example, ``--select=E4,E51,W234``.
(Default: ['E', 'F', 'W', 'C90'])
The docs are a bit more explicit:
Specify the list of error codes you wish Flake8 to report.
https://flake8.pycqa.org/en/latest/user/options.html#cmdoption-flake8-select
Example
An example running on the TensorFlow codebase:
$ time flake8
...
flake8 323.91s user 4.31s system 98% cpu 5:32.78 total
$ time flake8 --select YTT
...
flake8 --select YTT 318.62s user 3.80s system 99% cpu 5:25.51 total
Both about the same, around 5m20s.
With an ugly hack (I know this mixes plugin names with error codes, but it's just to get a rough idea, and there's other places to skip too):
diff --git a/src/flake8/checker.py b/src/flake8/checker.py
index d993cb9..9ed986d 100644
--- a/src/flake8/checker.py
+++ b/src/flake8/checker.py
@@ -486,6 +486,8 @@ class FileChecker(object):
return
for plugin in self.checks["ast_plugins"]:
+ if plugin["name"] != "YTT":
+ continue
checker = self.run_check(plugin, tree=ast)
# If the plugin uses a class, call the run method of it, otherwise
# the call should return something iterable itself
$ time flake8 --select YTT
flake8 --select YTT 276.90s user 3.17s system 98% cpu 4:43.00 total
About 4m30s, nearly a minute and ~13% faster.
In GitLab by @sigmavirus24 on Jun 5, 2020, 06:01
This would break our verbose output that tells people how many errors were ignored and not reported. Also there are nuanced ways to ignore codes so this isn't feasible to skip things. Some plugins register just a prefix and we'd have no way of skipping a sub error code check, especially depending on how the plug-in is written
In GitLab by @sigmavirus24 on Jun 5, 2020, 15:59
Perhaps the better way to do this is to have a --disable-extensions
option because relying on --select
is too fraught
In GitLab by @andersk on Feb 13, 2021, 12:57
pycodestyle can do this and save significant time. So surely Flake8 ought to be able to do it too, at least for some checks including the pycodestyle ones, when verbose output is not requested.
$ git clone https://github.com/zulip/zulip.git
$ cd zulip; rm setup.cfg
$ time pycodestyle -qq --count .
15849
real 0m22.806s
user 0m22.759s
sys 0m0.020s
$ time pycodestyle -qq --select=E265 --count .
4
real 0m9.721s
user 0m9.680s
sys 0m0.030s
$ time flake8 -j1 -qq --count .
15831
real 0m50.552s
user 0m50.281s
sys 0m0.213s
$ time flake8 -j1 -qq --select=E265 --count .
4
real 0m50.434s
user 0m50.177s
sys 0m0.195s
This is not only a performance optimization but also a stability improvement. What you don't run can't break. Flake8's plugin discovery can break a CI pipeline at any time when dependencies are updated, because some of the plugin libraries may change their behaviour or if something unexpected is in the importpath. If one can exactly specify which to run, this reduces the chance of such surprises. Examples:
- #1385
- #1112
- #1638
- ...
There may be better examples, these might partially be debatable, but the problem class definitely exists in the deep fires of Python's dependency hell.