pylint icon indicating copy to clipboard operation
pylint copied to clipboard

E0401 (import-error) checks perform a lot of repeated stat calls

Open correctmost opened this issue 1 year ago • 1 comments

Bug description

I run pylint on a repo that is mounted via SSHFS, which leads to slow I/O speeds.

While profiling a run, I noticed that the import-error checks perform a lot of repeated stat calls because they check for the presence of various .py, .pyc, .so, .cpython-311-x86_64-linux-gnu.so, etc. files.

Many of these presence checks are repeated, so I'm wondering if it would be possible to improve performance by eliminating repeated checks or caching the results of previous calls.

I have prepared a repo that illustrates the issue. (The example repo contains ~60 files, whereas the repo I noticed the performance issue with contains ~2000 files.)

I noticed that pylint's performance can be improved by adding "missing" __init__.py files to the repo, but I'm hoping pylint itself can be tuned to increase performance even further.

Configuration

[MAIN]
jobs=1

[MESSAGES CONTROL]
disable=all
enable=E0401

[REPORTS]
reports=no
score=no

Command used

Steps to reproduce

git clone --branch import-error-stats https://github.com/correctmost/pylint-corpus.git
cd pylint-corpus

python ./profile_pylint.py
head -n 20 profiler_stats

Analysis

Notice that one of the top results is for posix.stat:

--> 27668    0.185    0.000    0.187    0.000 {built-in method posix.stat}

posix.stat is called by isfile, which is called most often by find_module in astroid:

--> <frozen genericpath>:27(isfile) <-   15128    0.044    1.282  astroid/interpreter/_import/spec.py:129(find_module)

There is evidence of repeated stats from strace:

$ strace -e trace=%%stat python profile_pylint.py 2>&1 | sort | uniq -c | sort -nr | less

1314 newfstatat(AT_FDCWD, "pylint-corpus/src/__init__.py", {st_mode=S_IFREG|0644, st_size=0, ...}, 0) = 0
 904 newfstatat(AT_FDCWD, "pylint-corpus/src/resources/__init__.pyc", 0x7ffd4b370690, 0) = -1 ENOENT (No such file or directory)
 904 newfstatat(AT_FDCWD, "pylint-corpus/src/resources/__init__.py", 0x7ffd4b370690, 0) = -1 ENOENT (No such file or directory)
 811 newfstatat(AT_FDCWD, "pylint-corpus/src/resources", {st_mode=S_IFDIR|0755, st_size=4096, ...}, 0) = 0
 710 newfstatat(AT_FDCWD, "pylint-corpus", {st_mode=S_IFDIR|0755, st_size=4096, ...}, 0) = 0
 553 newfstatat(AT_FDCWD, "pylint-corpus/src/sites/hierarchy/cat1/subcat1", {st_mode=S_IFDIR|0755, st_size=4096, ...}, 0) = 0
 552 newfstatat(AT_FDCWD, "pylint-corpus/src/__init__.so", 0x7ffd4b370690, 0) = -1 ENOENT (No such file or directory)
 552 newfstatat(AT_FDCWD, "pylint-corpus/src/__init__.cpython-311-x86_64-linux-gnu.so", 0x7ffd4b370690, 0) = -1 ENOENT (No such file or directory)
 552 newfstatat(AT_FDCWD, "pylint-corpus/src/__init__.abi3.so", 0x7ffd4b370690, 0) = -1 ENOENT (No such file or directory)
 550 newfstatat(AT_FDCWD, "pylint-corpus/src/sites/hierarchy/cat1/subcat1/src.so", 0x7ffd4b370690, 0) = -1 ENOENT (No such file or directory)
 550 newfstatat(AT_FDCWD, "pylint-corpus/src/sites/hierarchy/cat1/subcat1/src.pyc", 0x7ffd4b370690, 0) = -1 ENOENT (No such file or directory)
 550 newfstatat(AT_FDCWD, "pylint-corpus/src/sites/hierarchy/cat1/subcat1/src.py", 0x7ffd4b370690, 0) = -1 ENOENT (No such file or directory)
 550 newfstatat(AT_FDCWD, "pylint-corpus/src/sites/hierarchy/cat1/subcat1/src/__init__.pyi", 0x7ffd4b370690, 0) = -1 ENOENT (No such file or directory)
 550 newfstatat(AT_FDCWD, "pylint-corpus/src/sites/hierarchy/cat1/subcat1/src/__init__.pyc", 0x7ffd4b370690, 0) = -1 ENOENT (No such file or directory)
 550 newfstatat(AT_FDCWD, "pylint-corpus/src/sites/hierarchy/cat1/subcat1/src/__init__.py", 0x7ffd4b370690, 0) = -1 ENOENT (No such file or directory)
 550 newfstatat(AT_FDCWD, "pylint-corpus/src/sites/hierarchy/cat1/subcat1/src.cpython-311-x86_64-linux-gnu.so", 0x7ffd4b370690, 0) = -1 ENOENT (No such file or directory)
 550 newfstatat(AT_FDCWD, "pylint-corpus/src/sites/hierarchy/cat1/subcat1/src.abi3.so", 0x7ffd4b370690, 0) = -1 ENOENT (No such file or directory)

Pylint output

There is no output, just reduced performance

Expected behavior

Improved performance via caching or reduced file-presence checks

Pylint version

pylint 3.0.3
astroid 3.0.2
Python 3.11.6 (main, Nov 14 2023, 09:36:21) [GCC 13.2.1 20230801]

OS / Environment

Arch Linux

Additional dependencies

No response

correctmost avatar Dec 18 '23 09:12 correctmost

Thank you for analyzing the problem and opening an issue.

Pierre-Sassoulas avatar Dec 18 '23 21:12 Pierre-Sassoulas

Ran the commands as specified, seems like I was able to reproduce it. Here is an excerpt:

135887165 function calls (125297436 primitive calls) in 416.767 seconds
Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 42330150   54.812    0.000   54.916    0.000 {built-in method builtins.isinstance}
    46521   36.663    0.001   69.141    0.001 pylint-corpus/venv/lib/python3.11/site-packages/astroid/interpreter/_import/spec.py:336(_get_zipimporters)
1471365/1303788   24.992    0.000   89.879    0.000 pylint-corpus/venv/lib/python3.11/site-packages/astroid/transforms.py:59(_transform)
  2170870   17.717    0.000   39.384    0.000 /usr/lib/python3.11/tokenize.py:433(_tokenize)
2694089/2794   15.479    0.000  126.435    0.045 pylint-corpus/venv/lib/python3.11/site-packages/astroid/transforms.py:106(_visit_generic)
1471365/1385   11.403    0.000  126.860    0.092 pylint-corpus/venv/lib/python3.11/site-packages/astroid/transforms.py:78(_visit)
943457/1151    9.901    0.000  193.783    0.168 pylint-corpus/venv/lib/python3.11/site-packages/pylint/utils/ast_walker.py:72(walk)
  7190960    9.836    0.000    9.836    0.000 pylint-corpus/venv/lib/python3.11/site-packages/astroid/brain/brain_numpy_utils.py:64(name_looks_like_numpy_member)
  1992340    8.433    0.000   13.762    0.000 pylint-corpus/venv/lib/python3.11/site-packages/astroid/brain/brain_builtin_inference.py:174(_builtin_filter_predicate)
  2549321    7.740    0.000    7.740    0.000 {method 'match' of 're.Pattern' objects}
  1013403    7.578    0.000   15.724    0.000 <frozen posixpath>:71(join)
     2357    6.710    0.003    6.711    0.003 {built-in method builtins.compile}
  4427955    5.813    0.000    5.813    0.000 {method 'get' of 'dict' objects}
  2138556    5.791    0.000    8.817    0.000 <string>:1(<lambda>)
1530664/28481    5.540    0.000   40.758    0.001 pylint-corpus/venv/lib/python3.11/site-packages/astroid/rebuilder.py:472(visit)
552142/197    5.164    0.000   13.760    0.070 pylint-corpus/venv/lib/python3.11/site-packages/pylint/utils/file_state.py:56(_set_state_on_block_lines)
485780/274829    4.348    0.000   13.787    0.000 /usr/lib/python3.11/functools.py:981(__get__)
  1000703    4.048    0.000    4.094    0.000 {built-in method posix.stat}
  2934432    4.009    0.000    4.016    0.000 pylint-corpus/venv/lib/python3.11/site-packages/astroid/brain/brain_numpy_utils.py:72(attribute_looks_like_numpy_member)
   123173    3.937    0.000   40.614    0.000 pylint-corpus/venv/lib/python3.11/site-packages/astroid/interpreter/_import/spec.py:128(find_module)

Can't promise anything but will take a look into it. Besides comparing results against this test repository itself, I wonder if there are a set of standard repos to use for performance analysis? Was thinking of double checking any changes against some repos used in the primer :thinking:

crazybolillo avatar Mar 29 '24 05:03 crazybolillo