pylint
pylint copied to clipboard
Concurrency turning out useless on codebase & machine
This is on a codebase with 260kLOC across ~3600 files (python-only, according to tokei
), on a 2010 MBP (2 cores 2 HT) running OSX 10.11, under Python 3.6.6 from macports
Using -j
with a number of jobs different from 1 significantly increases CPU consumption (~90%/core), but yields no improvement in wallclock time:
> pylint -j1 *
pylint -v -j1 * 1144.10s user 44.51s system 96% cpu 20:36.81 total
> pylint -j2 *
pylint -j2 * 2386.66s user 117.09s system 184% cpu 22:37.15 total
> pylint -j4 *
pylint -j4 * 3897.49s user 161.62s system 340% cpu 19:50.96 total
> pylint -j0 *
pylint -j * 3850.79s user 155.45s system 341% cpu 19:31.81 total
Not sure what other informations to provide.
Wow, that's incredible, thanks for reporting an issue. I wonder if the overhead of pickling the results back to the workers is too big at this point, we'll have to switch our approach for the parallel runner if that's the case.
I've observing this too (OS X, i7 processor). --jobs
merely multiplies the CPU time, with negligible effect on wall time.
$ pylint --version
pylint 2.1.1
$ time pylint my_package/
real 1m27.865s
user 1m25.645s
sys 0m1.996s
$ time pylint --jobs 4 my_package/
real 1m17.986s
user 4m14.076s
sys 0m12.917s
I found Pylint got a lot faster by removing concurrency (jobs=1
) compared to trying to make it as concurrent as it should (jobs=0
). The execution time across a number of different project code sizes sped up by 2.5-3 times.
A concrete project has 134k LOC across 1662 Python files and a Pylint run across all the files dropped from 3m 33s to 1m 30s on average on a MBP dual core (with HT). CPU utilisation also dropped to less than half according to CPU time.
I wonder if there's any cases where running Pylint concurrently is helping, or if it would be better to disable the feature for now?
Some results on Windows 10 2004, Intel 9850H 6 cores/12 threads 32 bit pylinting matplotlib. Interesting is that with roughly half the threads we get fastest result. Results in seconds wall clock duration.
pylint --version
pylint 2.6.0
astroid 2.4.2
Python 3.7.9 (tags/v3.7.9:13c94747c7, Aug 17 2020, 18:01:55) [MSC v.1900 32 bit (Intel)]
cloc matplotlib
360 text files.
352 unique files.
154 files ignored.
github.com/AlDanial/cloc v 1.86 T=1.09 s (226.7 files/s, 144433.3 lines/s)
-------------------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------------------
Python 221 25052 39919 85792
Running pylint with the default configuration;
pylint -j12 matplotlib 1>NUL
71.7007919
pylint -j11 matplotlib 1>NUL
71.2478186
pylint -j10 matplotlib 1>NUL
69.9589435
pylint -j9 matplotlib 1>NUL
69.8973282
pylint -j8 matplotlib 1>NUL
66.9836301
pylint -j7 matplotlib 1>NUL
67.7956229
pylint -j6 matplotlib 1>NUL
65.0402625
pylint -j5 matplotlib 1>NUL
67.2663403
pylint -j4 matplotlib 1>NUL
73.0464569
pylint -j3 matplotlib 1>NUL
88.9432869
pylint -j2 matplotlib 1>NUL
120.4550162
pylint -j1 matplotlib 1>NUL
238.5004808
@owillebo thank for the data, I think intuitively it makes sense that the optimal is 6 threads on a 6 core machine. Apparently this bug is not affecting you.
Thanks,
I think utilizing all (12) available threads and halving the time for running Pylint is a good thing. Burning threads is a waste of time and resources. I think this bug is affecting more than myself (which is indeed of less importance).
On Sun, Sep 27, 2020, 17:11 Pierre Sassoulas [email protected] wrote:
@owillebo https://github.com/owillebo thank for the data, I think intuitively it makes sense that the optimal is 6 threads on a 6 core machine. Apparently this bug is not affecting you.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/PyCQA/pylint/issues/2525#issuecomment-699647530, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADXQ4KTGMDEX5PHGESTNQYDSH5IZTANCNFSM4FXU6BDQ .
@owillebo thank for the data, I think intuitively it makes sense that the optimal is 6 threads on a 6 core machine. Apparently this bug is not affecting you.
Indeed, hyperthreading can lead to better use of the underlying hardware, but if there are no significant stalls (or both hyperthreads are stalled in similar ways) and all the threads are competing for the same underlying units the hyperthreads are just going to sequentially use the same resources.
And the can is so conditional that, given the security issues of their implementation, Intel is actually moving away from HT: the 9th gen only uses HT at the very high end (i9) and very low (Celeron) ends, none of the 9th gen i3, i5 and i7 supports hyperthreading.
If I run two pylint sessions concurrently each with 6 jobs and another half of the matplotlib files, the wall clock duration drops from 65 seconds (for all files in 1 session) down to 60 seconds. Indeed my threads don't bring much.
On Mon, 28 Sep 2020 at 10:10, xmo-odoo [email protected] wrote:
@owillebo https://github.com/owillebo thank for the data, I think intuitively it makes sense that the optimal is 6 threads on a 6 core machine. Apparently this bug is not affecting you.
Indeed, hyperthreading can lead to better use of the underlying hardware, but if there are no significant stalls (or both hyperthreads are stalled in similar ways) and all the threads are competing for the same underlying units the hyperthreads are just going to sequentially use the same resources.
And the can is so conditional that, given the security issues of their implementation, Intel is actually moving away from HT: the 9th gen only uses HT at the very high end (i9) and very low (Celeron) ends, none of the 9th gen i3, i5 and i7 supports hyperthreading.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/PyCQA/pylint/issues/2525#issuecomment-699852132, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADXQ4KW3NK6KZODBTEDKV43SIBAGDANCNFSM4FXU6BDQ .
https://github.com/PyCQA/pylint/issues/6978#issuecomment-1159559260 is quite an interesting result.
#6978 (comment) is quite an interesting result.
That is true but I think it's a different issue: in the original post the CPU% does grow pretty linearly with the number of workers, which indicates that the core issue isn't a startup stall (#6978 shows clear CPU usage dips).
Also FWIW I've re-run pylint on the original project, though only on a subset (as I think the old run was on 1.x, and pylint has slowed a fair bit in the meantime, plus the project has grown).
This is on an 4 cores 8 threads Linux machine (not macOS this time), Python 3.8.12, 2.14.5.
The subsection I linted is 71kLOC in 400 files. The results are as follow:
-j0 pylint -j$i * > /dev/null 206.82s user 1.05s system 99% cpu 3:27.90 total
-j1 pylint -j$i * > /dev/null 205.74s user 1.08s system 99% cpu 3:26.85 total
-j2 pylint -j$i * > /dev/null 163.57s user 1.59s system 199% cpu 1:22.77 total
-j3 pylint -j$i * > /dev/null 198.93s user 2.15s system 298% cpu 1:07.29 total
-j4 pylint -j$i * > /dev/null 238.08s user 2.52s system 384% cpu 1:02.55 total
-j5 pylint -j$i * > /dev/null 304.31s user 3.00s system 450% cpu 1:08.26 total
-j6 pylint -j$i * > /dev/null 374.35s user 3.96s system 551% cpu 1:08.61 total
-j7 pylint -j$i * > /dev/null 462.39s user 4.68s system 639% cpu 1:13.04 total
-j8 pylint -j$i * > /dev/null 487.39s user 5.20s system 688% cpu 1:11.56 total
- pylint does seem to scale to
-j2
, there's even a minor gain at-j3
(though far from 50%), beyond that it again spins its wheels and burns CPU with no improvement (the opposite really) - I thought
j0
would be equivalent toj8
but apparently it'sj1
? - I'm not entirely sure why
j1
costs so much more thanj2
(almost 3x the wallclock time, and 20% higher USER), but it is repeatedly reproducible, I ran each 5 times in a row, and they exhibited those behaviors and wallclocks (roughly) very reliably, in fact 200-ish USER is on the lower end of j1 (it goes as high as 300), while 160-ish USER is about par for j2.
On M1 Macs, on large codebases, -j=0
is equivalent to -j=10
, and seems (unsurprisingly, given high performance vs efficiency cores) to perform worse than -j=6
; this makes it difficult to specify a single value in the shared pylintrc
config file for a repository shared between developers using a broad variety of machines, and likely makes -j=0
undesirable on recent Apple machines.
@olivierlefloch I don't think that can be solved by pylint (or any other auto-worker program): a while back I tried to see if the stdlib had a way to know real cores (as opposed to vcores / hyper threads) due to the comment preceding yours and didn't find one. I don't remember seeing anything for efficiency/performance either.
I think the best solution would be to run under a bespoke hardware layout (make it so pylint can only see an enumerated set of cores of your choice), but I don't know if macos supports this locally (I don't remember something similar to linux's taskset
). There is a program called CPUSetter which allows disabling cores globally, but...
Also it doesn't seem like e.g. multiprocessing.cpu_count()
is aware of CPU affinity, however pylint already uses os.sched_getaffinity
so it should work properly on linux.