astroid
astroid copied to clipboard
Improve performance by caching find_spec
Certain checkers upstream on pylint like import-error heavily use find_spec. This method is IO intensive as it looks for files across several search paths to return a ModuleSpec.
Since imports across files may repeat themselves it makes sense to cache this method in order to speed up the linting process.
Local testing shows that caching reduces the total amount of calls to find_module methods (used by find_spec) by about 50%. Linting the test repository in the related issue goes from 40 seconds to 37 seconds. This was on a NVME disk and after warmup, so timing gains may be bigger on slower file systems like the one mentioned in the referenced issue.
Closes pylint-dev/pylint#9310.
| Type | |
|---|---|
| ✓ | :hammer: Refactoring |
The cache feels dirty but lru_cache and its siblings force you to cache the whole thing which is impossible (lists can't be hashed) and feels redundant, if I already have info on the module requested, who cares if the search paths this time are different. I may be horrendously wrong on that last statement, but it was my thought process.
There may be room for improvement, however I think this can get the ball rolling to discuss changes/improvements.
Here is a brief review of the profiling I did.
Before PR
Timings using hyperfine
Time (mean ± σ): 40.231 s ± 0.429 s [User: 36.848 s, System: 3.355 s]
Range (min … max): 39.611 s … 40.846 s 10 runs
Syscalls
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
62.70 5.696665 3 1578640 1260035 newfstatat
21.24 1.929284 2 756383 clock_gettime
4.53 0.411450 5 72581 13450 openat
3.31 0.300422 2 101300 read
1.99 0.180693 3 59132 close
1.89 0.172086 2 69186 5 lseek
1.84 0.166910 2 62868 fstat
1.69 0.153237 2 58172 58153 ioctl
0.31 0.028427 3 8483 getcwd
0.15 0.013854 2 5047 mmap
0.15 0.013600 2 4724 munmap
0.13 0.011465 6 1804 getdents64
0.05 0.004886 4 1149 brk
0.02 0.001471 6 230 write
0.00 0.000408 7 56 mprotect
0.00 0.000221 221 1 rename
0.00 0.000092 18 5 getrandom
0.00 0.000076 7 10 4 readlink
0.00 0.000042 42 1 unlink
0.00 0.000021 3 6 fcntl
0.00 0.000015 7 2 prlimit64
0.00 0.000014 7 2 gettid
0.00 0.000010 10 1 futex
0.00 0.000009 9 1 getpid
0.00 0.000005 1 5 geteuid
0.00 0.000004 0 5 getuid
0.00 0.000002 0 3 getgid
0.00 0.000002 0 3 getegid
0.00 0.000002 1 2 setfsuid
0.00 0.000002 1 2 setfsgid
0.00 0.000001 0 67 rt_sigaction
0.00 0.000001 0 2 1 access
0.00 0.000000 0 2 pread64
0.00 0.000000 0 1 mremap
0.00 0.000000 0 1 execve
0.00 0.000000 0 3 uname
0.00 0.000000 0 1 1 mkdir
0.00 0.000000 0 1 arch_prctl
0.00 0.000000 0 1 set_tid_address
0.00 0.000000 0 1 set_robust_list
0.00 0.000000 0 1 epoll_create1
0.00 0.000000 0 1 rseq
------ ----------- ----------- --------- --------- ----------------
100.00 9.085377 3 2779886 1331649 total
After PR
Timings using hyperfine
Time (mean ± σ): 37.277 s ± 0.131 s [User: 35.208 s, System: 2.043 s]
Range (min … max): 37.022 s … 37.508 s 10 runs
Syscalls
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
54.62 2.141208 2 827202 642701 newfstatat
29.37 1.151274 2 569776 clock_gettime
4.47 0.175144 4 43641 13443 openat
2.79 0.109267 2 53100 read
1.77 0.069260 1 40245 5 lseek
1.69 0.066270 2 30199 close
1.63 0.063822 1 33933 fstat
1.34 0.052577 1 29240 29221 ioctl
1.16 0.045629 3 11589 getcwd
0.41 0.016088 8 1804 getdents64
0.31 0.012145 2 5150 mmap
0.28 0.011102 2 4836 munmap
0.12 0.004787 4 1059 brk
0.02 0.000756 3 229 write
0.01 0.000429 6 67 rt_sigaction
0.01 0.000300 5 56 mprotect
0.00 0.000102 10 10 4 readlink
0.00 0.000048 8 6 fcntl
0.00 0.000041 8 5 getrandom
0.00 0.000016 8 2 prlimit64
0.00 0.000008 4 2 gettid
0.00 0.000005 2 2 mremap
0.00 0.000002 0 3 getgid
0.00 0.000002 0 5 geteuid
0.00 0.000002 0 3 getegid
0.00 0.000002 1 2 setfsuid
0.00 0.000002 1 2 setfsgid
0.00 0.000001 0 2 1 access
0.00 0.000001 0 5 getuid
0.00 0.000000 0 2 pread64
0.00 0.000000 0 1 getpid
0.00 0.000000 0 1 execve
0.00 0.000000 0 3 uname
0.00 0.000000 0 1 1 mkdir
0.00 0.000000 0 1 unlink
0.00 0.000000 0 1 arch_prctl
0.00 0.000000 0 1 futex
0.00 0.000000 0 1 set_tid_address
0.00 0.000000 0 1 set_robust_list
0.00 0.000000 0 1 epoll_create1
0.00 0.000000 0 1 rseq
------ ----------- ----------- --------- --------- ----------------
100.00 3.920290 2 1652190 685376 total
I also obtained this call graph to find the best place to cache. The issue refers to find_module but I thought it was better to cache up the stack in find_spec because that method calls itself all the finders available. Since I thought it was cool and may be of interest, I am attaching it. You will need to zoom into it.
Also, I used the repository linked in the issue to obtain all the profiling data. I am still yet to profile the memory allocated for the process execution, will try to obtain that since I want to measure the hit an unbound cache may create.
Thank you for the analysis of what need to be cachec. Any reason not to use
functools.cacheorfunctools.lru_cacheinstead of using a global ?
I really wanted to use something standard, however the problem I found with functools caches is that they take into consideration all function arguments while I thought it only made sense to cache the requested module because:
- Search paths are lists so they need further processing to hash them
- I don't think search paths should be cached? If I already know where the package is found, does it matter if you give me different search paths? This point I am not so sure about, but it was my logic.
I tried the approach but It does not seem to work, main reason being that find_spec uses lists, so wrapping the function to pass it a tuple in order to make it cachable fails.
We would also need to make *args hashable, but that one is tricky, since I don't think order matters for search paths, so if the search path order changes we would get different hashes and the cache wont be hit as much.
What do you think? Maybe I am missing something.
I think we could work a hack use lru_cache internals but the code would be probably more complicated than using our own cache :thinking:
I'm hesitant to add a global, because pylint/astroid can be used on massive codebase and we had issues with memory constantly increasing in the past. I don't remember all cache clear API where we need to add this global to myself, and would need some time to check it (Jacob probably remember better than me if it comes to it though). Using a global is not impossible but it's easy to make a mistake and create a memory leak. So using lru_cache to set a max number of value stored would be convenient / reassuring. I suppose another (less elegant than adding two decorators) approach would be to create a new cached subfunction inside the current function with the paths we want to cache specifically, what do you think ?
I think I was able to find a solution that lets us use lru_cache without getting into too much decorators or sub functions. I have basically wrapped the arguments in a class so we can control its hash and ignore problematic arguments. The benchmarks still look good as well.
Without cache:
Time (mean ± σ): 58.254 s ± 0.470 s [User: 53.538 s, System: 4.705 s]
Range (min … max): 57.863 s … 59.150 s 10 runs
With cache:
Time (mean ± σ): 52.131 s ± 0.191 s [User: 49.164 s, System: 2.954 s]
Range (min … max): 51.910 s … 52.550 s 10 runs
What do you think?
Nuked AstroidManager as suggested. Just waiting on your thoughts about exceptions preventing not found packages from being cached.
Thanks, I didn't notice that ImportError is raised when nothing is found. Good point.
However, I still think we should add the path to the cache key even when modules were found.
The following is unexpected, and I think it's realistic given the upstream callers like ast_from_module_name(). Pathing is one of the most sensitive parts of astroid.
>>> from astroid.interpreter._import.spec import find_spec
>>> find_spec(['brain'], ['astroid'])
ModuleSpec(name='brain', type=<ModuleType.PKG_DIRECTORY: 3>, location='astroid/brain', origin=None, submodule_search_locations=None)
>>> find_spec(['brain'], ['pylint'])
ModuleSpec(name='brain', type=<ModuleType.PKG_DIRECTORY: 3>, location='astroid/brain', origin=None, submodule_search_locations=None)
https://github.com/pylint-dev/astroid/blob/7a3b482b9673243d2ccc895672eb1e452f5daa82/astroid/manager.py#L195
For example, the path is part of the cache key for modules here:
https://github.com/pylint-dev/astroid/blob/7a3b482b9673243d2ccc895672eb1e452f5daa82/astroid/manager.py#L301-L305
I made the search path part of its cache by converting it into a frozenset however there are lots of cache misses and the performance benefit gets nullified, so maybe this is harder than I thought :thinking: . This is the diff I tried (when compared with the current PR)
diff --git a/astroid/interpreter/_import/spec.py b/astroid/interpreter/_import/spec.py
index c812a612..62bb3e6c 100644
--- a/astroid/interpreter/_import/spec.py
+++ b/astroid/interpreter/_import/spec.py
@@ -428,6 +428,7 @@ def _find_spec_with_path(
@dataclass(frozen=True)
class SpecArgs:
key: tuple
+ path_cache: frozenset | None
modpath: list[str] = field(compare=False, hash=False)
path: Sequence[str] | None = field(compare=False, hash=False)
@@ -449,7 +450,7 @@ def find_spec(modpath: list[str], path: Sequence[str] | None = None) -> ModuleSp
:return: A module spec, which describes how the module was
found and where.
"""
- return _find_spec(SpecArgs(tuple(modpath), modpath, path))
+ return _find_spec(SpecArgs(tuple(modpath), frozenset(path) if path else None, modpath, path))
Here are the cache specs after running it on the test repo provided in the original issue:
No cache for search path: CacheInfo(hits=39800, misses=34979, maxsize=1024, currsize=1024)
Caching search path: CacheInfo(hits=39755, misses=74619, maxsize=1024, currsize=1024)
Am I missing something? Maybe a better way to cache the path? Otherwise it seems like it may be better to optimize somewhere else.
Also since the main point of this PR (perf) may take a while, I can make a new PR to apply the nuking of the Astroid test class separately.
About the dataclasses, did you consider doing something like this for simplicity?
diff --git a/astroid/interpreter/_import/spec.py b/astroid/interpreter/_import/spec.py
index c812a612..65d7c88d 100644
--- a/astroid/interpreter/_import/spec.py
+++ b/astroid/interpreter/_import/spec.py
@@ -16,7 +16,6 @@ import types
import warnings
import zipimport
from collections.abc import Iterator, Sequence
-from dataclasses import dataclass, field
from functools import lru_cache
from pathlib import Path
from typing import Any, Literal, NamedTuple, Protocol
@@ -425,13 +424,6 @@ def _find_spec_with_path(
raise ImportError(f"No module named {'.'.join(module_parts)}")
-@dataclass(frozen=True)
-class SpecArgs:
- key: tuple
- modpath: list[str] = field(compare=False, hash=False)
- path: Sequence[str] | None = field(compare=False, hash=False)
-
-
def find_spec(modpath: list[str], path: Sequence[str] | None = None) -> ModuleSpec:
"""Find a spec for the given module.
@@ -449,15 +441,14 @@ def find_spec(modpath: list[str], path: Sequence[str] | None = None) -> ModuleSp
:return: A module spec, which describes how the module was
found and where.
"""
- return _find_spec(SpecArgs(tuple(modpath), modpath, path))
-
+ return _find_spec(tuple(modpath), tuple(path) if path else None)
@lru_cache(maxsize=1024)
-def _find_spec(args: SpecArgs) -> ModuleSpec:
- _path = args.path or sys.path
+def _find_spec(modpath: tuple[str], path: tuple[str] | None = None) -> ModuleSpec:
+ _path = path or sys.path
# Need a copy for not mutating the argument.
- modpath = args.modpath[:]
+ modpath = list(modpath)
submodule_path = None
module_parts = modpath[:]
@@ -466,7 +457,7 @@ def _find_spec(args: SpecArgs) -> ModuleSpec:
while modpath:
modname = modpath.pop(0)
finder, spec = _find_spec_with_path(
- _path, modname, module_parts, processed, submodule_path or args.path
+ _path, modname, module_parts, processed, submodule_path or path
)
processed.append(modname)
if modpath:
Am I missing something? Maybe a better way to cache the path? Otherwise it seems like it may be better to optimize somewhere else.
I'm wondering if the issue is in pylint. We're probably doing some unnecessary "is this import a relative import, if so provide the filepath we started from". It's providing all those filepaths that get checked first and strike out that are adding to the number of calls here.
This diff reduces a ton of calls, but it would need to be checked for correctness:
diff --git a/pylint/checkers/imports.py b/pylint/checkers/imports.py
index ac8962c50..361bd5571 100644
--- a/pylint/checkers/imports.py
+++ b/pylint/checkers/imports.py
@@ -1043,11 +1043,13 @@ class ImportsChecker(DeprecatedMixin, BaseChecker):
module_file = node.root().file
context_name = node.root().name
base = os.path.splitext(os.path.basename(module_file))[0]
-
try:
- importedmodname = astroid.modutils.get_module_part(
- importedmodname, module_file
- )
+ if isinstance(node, nodes.ImportFrom):
+ importedmodname = astroid.modutils.get_module_part(
+ importedmodname, module_file
+ )
+ else:
+ importedmodname = astroid.modutils.get_module_part(importedmodname)
except ImportError:
pass
Yes, it seems like there are lots of calls for the same package with different paths which obliterates the cache. I collected all calls to find_spec with their arguments and grouped them by requested modpath. Here is an excerpt:
** ['jinja2'] 403
- ['pylint-corpus/venv/lib/python3.11/site-packages/scalene']
- ['.', 'pylint-corpus/venv/lib/python3.11/site-packages', 'pylint-corpus', 'pylint-corpus/venv/lib/python3.11/site-packages', 'pylint-corpus', '/usr/lib/python311.zip', '/usr/lib/python3.11', '/usr/lib/python3.11/lib-dynload', 'pylint-corpus/venv/lib/python3.11/site-packages', 'astroid']
- ['pylint-corpus/venv/lib/python3.11/site-packages/jinja2']
- None
- ['pylint-corpus/venv/lib/python3.11/site-packages/pkg_resources/_vendor/pyparsing/diagram']
- ['pylint-corpus/venv/lib/python3.11/site-packages/pip/_vendor/pyparsing/diagram']
- ['pylint-corpus/venv/lib/python3.11/site-packages/setuptools/_vendor/pyparsing/diagram']
jinja2 was requested 403 times and those were the search paths used. I also wonder why jinja2 is looked for since its not used in the test repo.
I will investigate in pylint.
I followed the suggestion and applied a similar change in pylint. Cache specs do improve a lot. Also it seems like previous tests were running with a polluted virtualenv so that explains stuff like looking for jinja2 :sweat_smile:.
Run times diminished in all cases because of the clean virtualenv, however relatively speaking the performance increase seems even better than the first iteration. I also changed tuples to frozensets since apparently they are faster by a noticeable margin (according to hyperfine).
Stats
No Cache
Time (mean ± σ): 46.954 s ± 1.127 s [User: 43.222 s, System: 3.700 s]
Range (min … max): 45.499 s … 48.479 s 10 runs
Cache
tuple
Time (mean ± σ): 38.001 s ± 0.359 s [User: 35.533 s, System: 2.452 s]
Range (min … max): 37.388 s … 38.509 s 10 runs
frozenset
Time (mean ± σ): 35.961 s ± 0.758 s [User: 33.585 s, System: 2.337 s]
Range (min … max): 34.297 s … 36.718 s 10 runs
The related PR in pylint that would make this cache work is pylint#9595.
Nevermind about the frozenset, when converting it back to a list ordering is sometimes lost so it can't be used, at least not for the sensitive modpath, for the search paths I don't think the order matters too much (?) so we can use frozensets there.
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 92.78%. Comparing base (
7a3b482) to head (82c839e). Report is 11 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #2408 +/- ##
==========================================
+ Coverage 92.76% 92.78% +0.02%
==========================================
Files 94 94
Lines 11087 11095 +8
==========================================
+ Hits 10285 10295 +10
+ Misses 802 800 -2
| Flag | Coverage Δ | |
|---|---|---|
| linux | 92.60% <100.00%> (+0.02%) |
:arrow_up: |
| pypy | 92.78% <100.00%> (+2.01%) |
:arrow_up: |
| windows | 92.69% <100.00%> (+0.32%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
| Files | Coverage Δ | |
|---|---|---|
| astroid/interpreter/_import/spec.py | 97.44% <100.00%> (+0.05%) |
:arrow_up: |
| astroid/manager.py | 89.58% <100.00%> (+0.04%) |
:arrow_up: |
The potential indeterminacy of the frozenset does scare me a little bit. Just this week on Discord someone was asking for pointers about debugging indeterminate behavior of the cyclic-import check, and I advised them to look into nondeterminacy in astroid. Leaving a frozenset for the path sounds like exactly that kind of behavior.
I'd suggest using tuple for both. But if you want to prepare a new benchmark and discuss it, that's fine too.
the performance increase seems even better than the first iteration
Right, more than 20% faster is HUGE !
search paths I don't think the order matters too much (?)
Not sure if I understand this right but the order of discovery does matter, right ? Doesn't it affect https://github.com/pylint-dev/pylint/issues/6535 (pinned issue in pylint) ?
Yeah, I traced what happens to that path argument, and it eventually ends up here, in an order-sensitive for loop in importlib, so I wouldn't want this to become unordered:
https://github.com/python/cpython/blob/978fba58aef347de4a1376e525df2dacc7b2fff3/Lib/importlib/_bootstrap_external.py#L1522
Side note: it seems like our constructor for Finder is actually useless? I can't find anywhere that uses _path. Could be a small gain to nuke it.
https://github.com/pylint-dev/astroid/blob/4a8827d7afe698d74cc1abfdbb6ef10de6e3d89f/astroid/interpreter/_import/spec.py#L85-L86
Ooops , I changed the search paths to tuple as well. Perf is still good, in fact since most of the time we should be getting no search paths I ran hyperfine again and there was not much of an impact.
Thanks to everyone as well for the help and reviews :handshake: