pylint
pylint copied to clipboard
E0401 (import-error) checks perform a lot of repeated stat calls
Bug description
I run pylint on a repo that is mounted via SSHFS, which leads to slow I/O speeds.
While profiling a run, I noticed that the import-error checks perform a lot of repeated stat calls because they check for the presence of various .py, .pyc, .so, .cpython-311-x86_64-linux-gnu.so, etc. files.
Many of these presence checks are repeated, so I'm wondering if it would be possible to improve performance by eliminating repeated checks or caching the results of previous calls.
I have prepared a repo that illustrates the issue. (The example repo contains ~60 files, whereas the repo I noticed the performance issue with contains ~2000 files.)
I noticed that pylint's performance can be improved by adding "missing" __init__.py files to the repo, but I'm hoping pylint itself can be tuned to increase performance even further.
Configuration
[MAIN]
jobs=1
[MESSAGES CONTROL]
disable=all
enable=E0401
[REPORTS]
reports=no
score=no
Command used
Steps to reproduce
git clone --branch import-error-stats https://github.com/correctmost/pylint-corpus.git
cd pylint-corpus
python ./profile_pylint.py
head -n 20 profiler_stats
Analysis
Notice that one of the top results is for posix.stat:
--> 27668 0.185 0.000 0.187 0.000 {built-in method posix.stat}
posix.stat is called by isfile, which is called most often by find_module in astroid:
--> <frozen genericpath>:27(isfile) <- 15128 0.044 1.282 astroid/interpreter/_import/spec.py:129(find_module)
There is evidence of repeated stats from strace:
$ strace -e trace=%%stat python profile_pylint.py 2>&1 | sort | uniq -c | sort -nr | less
1314 newfstatat(AT_FDCWD, "pylint-corpus/src/__init__.py", {st_mode=S_IFREG|0644, st_size=0, ...}, 0) = 0
904 newfstatat(AT_FDCWD, "pylint-corpus/src/resources/__init__.pyc", 0x7ffd4b370690, 0) = -1 ENOENT (No such file or directory)
904 newfstatat(AT_FDCWD, "pylint-corpus/src/resources/__init__.py", 0x7ffd4b370690, 0) = -1 ENOENT (No such file or directory)
811 newfstatat(AT_FDCWD, "pylint-corpus/src/resources", {st_mode=S_IFDIR|0755, st_size=4096, ...}, 0) = 0
710 newfstatat(AT_FDCWD, "pylint-corpus", {st_mode=S_IFDIR|0755, st_size=4096, ...}, 0) = 0
553 newfstatat(AT_FDCWD, "pylint-corpus/src/sites/hierarchy/cat1/subcat1", {st_mode=S_IFDIR|0755, st_size=4096, ...}, 0) = 0
552 newfstatat(AT_FDCWD, "pylint-corpus/src/__init__.so", 0x7ffd4b370690, 0) = -1 ENOENT (No such file or directory)
552 newfstatat(AT_FDCWD, "pylint-corpus/src/__init__.cpython-311-x86_64-linux-gnu.so", 0x7ffd4b370690, 0) = -1 ENOENT (No such file or directory)
552 newfstatat(AT_FDCWD, "pylint-corpus/src/__init__.abi3.so", 0x7ffd4b370690, 0) = -1 ENOENT (No such file or directory)
550 newfstatat(AT_FDCWD, "pylint-corpus/src/sites/hierarchy/cat1/subcat1/src.so", 0x7ffd4b370690, 0) = -1 ENOENT (No such file or directory)
550 newfstatat(AT_FDCWD, "pylint-corpus/src/sites/hierarchy/cat1/subcat1/src.pyc", 0x7ffd4b370690, 0) = -1 ENOENT (No such file or directory)
550 newfstatat(AT_FDCWD, "pylint-corpus/src/sites/hierarchy/cat1/subcat1/src.py", 0x7ffd4b370690, 0) = -1 ENOENT (No such file or directory)
550 newfstatat(AT_FDCWD, "pylint-corpus/src/sites/hierarchy/cat1/subcat1/src/__init__.pyi", 0x7ffd4b370690, 0) = -1 ENOENT (No such file or directory)
550 newfstatat(AT_FDCWD, "pylint-corpus/src/sites/hierarchy/cat1/subcat1/src/__init__.pyc", 0x7ffd4b370690, 0) = -1 ENOENT (No such file or directory)
550 newfstatat(AT_FDCWD, "pylint-corpus/src/sites/hierarchy/cat1/subcat1/src/__init__.py", 0x7ffd4b370690, 0) = -1 ENOENT (No such file or directory)
550 newfstatat(AT_FDCWD, "pylint-corpus/src/sites/hierarchy/cat1/subcat1/src.cpython-311-x86_64-linux-gnu.so", 0x7ffd4b370690, 0) = -1 ENOENT (No such file or directory)
550 newfstatat(AT_FDCWD, "pylint-corpus/src/sites/hierarchy/cat1/subcat1/src.abi3.so", 0x7ffd4b370690, 0) = -1 ENOENT (No such file or directory)
Pylint output
There is no output, just reduced performance
Expected behavior
Improved performance via caching or reduced file-presence checks
Pylint version
pylint 3.0.3
astroid 3.0.2
Python 3.11.6 (main, Nov 14 2023, 09:36:21) [GCC 13.2.1 20230801]
OS / Environment
Arch Linux
Additional dependencies
No response
Thank you for analyzing the problem and opening an issue.
Ran the commands as specified, seems like I was able to reproduce it. Here is an excerpt:
135887165 function calls (125297436 primitive calls) in 416.767 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
42330150 54.812 0.000 54.916 0.000 {built-in method builtins.isinstance}
46521 36.663 0.001 69.141 0.001 pylint-corpus/venv/lib/python3.11/site-packages/astroid/interpreter/_import/spec.py:336(_get_zipimporters)
1471365/1303788 24.992 0.000 89.879 0.000 pylint-corpus/venv/lib/python3.11/site-packages/astroid/transforms.py:59(_transform)
2170870 17.717 0.000 39.384 0.000 /usr/lib/python3.11/tokenize.py:433(_tokenize)
2694089/2794 15.479 0.000 126.435 0.045 pylint-corpus/venv/lib/python3.11/site-packages/astroid/transforms.py:106(_visit_generic)
1471365/1385 11.403 0.000 126.860 0.092 pylint-corpus/venv/lib/python3.11/site-packages/astroid/transforms.py:78(_visit)
943457/1151 9.901 0.000 193.783 0.168 pylint-corpus/venv/lib/python3.11/site-packages/pylint/utils/ast_walker.py:72(walk)
7190960 9.836 0.000 9.836 0.000 pylint-corpus/venv/lib/python3.11/site-packages/astroid/brain/brain_numpy_utils.py:64(name_looks_like_numpy_member)
1992340 8.433 0.000 13.762 0.000 pylint-corpus/venv/lib/python3.11/site-packages/astroid/brain/brain_builtin_inference.py:174(_builtin_filter_predicate)
2549321 7.740 0.000 7.740 0.000 {method 'match' of 're.Pattern' objects}
1013403 7.578 0.000 15.724 0.000 <frozen posixpath>:71(join)
2357 6.710 0.003 6.711 0.003 {built-in method builtins.compile}
4427955 5.813 0.000 5.813 0.000 {method 'get' of 'dict' objects}
2138556 5.791 0.000 8.817 0.000 <string>:1(<lambda>)
1530664/28481 5.540 0.000 40.758 0.001 pylint-corpus/venv/lib/python3.11/site-packages/astroid/rebuilder.py:472(visit)
552142/197 5.164 0.000 13.760 0.070 pylint-corpus/venv/lib/python3.11/site-packages/pylint/utils/file_state.py:56(_set_state_on_block_lines)
485780/274829 4.348 0.000 13.787 0.000 /usr/lib/python3.11/functools.py:981(__get__)
1000703 4.048 0.000 4.094 0.000 {built-in method posix.stat}
2934432 4.009 0.000 4.016 0.000 pylint-corpus/venv/lib/python3.11/site-packages/astroid/brain/brain_numpy_utils.py:72(attribute_looks_like_numpy_member)
123173 3.937 0.000 40.614 0.000 pylint-corpus/venv/lib/python3.11/site-packages/astroid/interpreter/_import/spec.py:128(find_module)
Can't promise anything but will take a look into it. Besides comparing results against this test repository itself, I wonder if there are a set of standard repos to use for performance analysis? Was thinking of double checking any changes against some repos used in the primer :thinking: