astroid Improve performance by caching find

Certain checkers upstream on pylint like import-error heavily use find_spec. This method is IO intensive as it looks for files across several search paths to return a ModuleSpec.

Since imports across files may repeat themselves it makes sense to cache this method in order to speed up the linting process.

Local testing shows that caching reduces the total amount of calls to find_module methods (used by find_spec) by about 50%. Linting the test repository in the related issue goes from 40 seconds to 37 seconds. This was on a NVME disk and after warmup, so timing gains may be bigger on slower file systems like the one mentioned in the referenced issue.

Closes pylint-dev/pylint#9310.

	Type
✓	:hammer: Refactoring

Apr 05 '24 06:04 crazybolillo

The cache feels dirty but lru_cache and its siblings force you to cache the whole thing which is impossible (lists can't be hashed) and feels redundant, if I already have info on the module requested, who cares if the search paths this time are different. I may be horrendously wrong on that last statement, but it was my thought process.

There may be room for improvement, however I think this can get the ball rolling to discuss changes/improvements.

Here is a brief review of the profiling I did.

Before PR

Timings using hyperfine

  Time (mean ± σ):     40.231 s ±  0.429 s    [User: 36.848 s, System: 3.355 s]
  Range (min … max):   39.611 s … 40.846 s    10 runs

Syscalls

  % time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 62.70    5.696665           3   1578640   1260035 newfstatat
 21.24    1.929284           2    756383           clock_gettime
  4.53    0.411450           5     72581     13450 openat
  3.31    0.300422           2    101300           read
  1.99    0.180693           3     59132           close
  1.89    0.172086           2     69186         5 lseek
  1.84    0.166910           2     62868           fstat
  1.69    0.153237           2     58172     58153 ioctl
  0.31    0.028427           3      8483           getcwd
  0.15    0.013854           2      5047           mmap
  0.15    0.013600           2      4724           munmap
  0.13    0.011465           6      1804           getdents64
  0.05    0.004886           4      1149           brk
  0.02    0.001471           6       230           write
  0.00    0.000408           7        56           mprotect
  0.00    0.000221         221         1           rename
  0.00    0.000092          18         5           getrandom
  0.00    0.000076           7        10         4 readlink
  0.00    0.000042          42         1           unlink
  0.00    0.000021           3         6           fcntl
  0.00    0.000015           7         2           prlimit64
  0.00    0.000014           7         2           gettid
  0.00    0.000010          10         1           futex
  0.00    0.000009           9         1           getpid
  0.00    0.000005           1         5           geteuid
  0.00    0.000004           0         5           getuid
  0.00    0.000002           0         3           getgid
  0.00    0.000002           0         3           getegid
  0.00    0.000002           1         2           setfsuid
  0.00    0.000002           1         2           setfsgid
  0.00    0.000001           0        67           rt_sigaction
  0.00    0.000001           0         2         1 access
  0.00    0.000000           0         2           pread64
  0.00    0.000000           0         1           mremap
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         3           uname
  0.00    0.000000           0         1         1 mkdir
  0.00    0.000000           0         1           arch_prctl
  0.00    0.000000           0         1           set_tid_address
  0.00    0.000000           0         1           set_robust_list
  0.00    0.000000           0         1           epoll_create1
  0.00    0.000000           0         1           rseq
------ ----------- ----------- --------- --------- ----------------
100.00    9.085377           3   2779886   1331649 total

After PR

Timings using hyperfine

  Time (mean ± σ):     37.277 s ±  0.131 s    [User: 35.208 s, System: 2.043 s]
  Range (min … max):   37.022 s … 37.508 s    10 runs

Syscalls

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 54.62    2.141208           2    827202    642701 newfstatat
 29.37    1.151274           2    569776           clock_gettime
  4.47    0.175144           4     43641     13443 openat
  2.79    0.109267           2     53100           read
  1.77    0.069260           1     40245         5 lseek
  1.69    0.066270           2     30199           close
  1.63    0.063822           1     33933           fstat
  1.34    0.052577           1     29240     29221 ioctl
  1.16    0.045629           3     11589           getcwd
  0.41    0.016088           8      1804           getdents64
  0.31    0.012145           2      5150           mmap
  0.28    0.011102           2      4836           munmap
  0.12    0.004787           4      1059           brk
  0.02    0.000756           3       229           write
  0.01    0.000429           6        67           rt_sigaction
  0.01    0.000300           5        56           mprotect
  0.00    0.000102          10        10         4 readlink
  0.00    0.000048           8         6           fcntl
  0.00    0.000041           8         5           getrandom
  0.00    0.000016           8         2           prlimit64
  0.00    0.000008           4         2           gettid
  0.00    0.000005           2         2           mremap
  0.00    0.000002           0         3           getgid
  0.00    0.000002           0         5           geteuid
  0.00    0.000002           0         3           getegid
  0.00    0.000002           1         2           setfsuid
  0.00    0.000002           1         2           setfsgid
  0.00    0.000001           0         2         1 access
  0.00    0.000001           0         5           getuid
  0.00    0.000000           0         2           pread64
  0.00    0.000000           0         1           getpid
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         3           uname
  0.00    0.000000           0         1         1 mkdir
  0.00    0.000000           0         1           unlink
  0.00    0.000000           0         1           arch_prctl
  0.00    0.000000           0         1           futex
  0.00    0.000000           0         1           set_tid_address
  0.00    0.000000           0         1           set_robust_list
  0.00    0.000000           0         1           epoll_create1
  0.00    0.000000           0         1           rseq
------ ----------- ----------- --------- --------- ----------------
100.00    3.920290           2   1652190    685376 total

Apr 05 '24 06:04 crazybolillo

I also obtained this call graph to find the best place to cache. The issue refers to find_module but I thought it was better to cache up the stack in find_spec because that method calls itself all the finders available. Since I thought it was cool and may be of interest, I am attaching it. You will need to zoom into it.

output

Apr 05 '24 06:04 crazybolillo

Also, I used the repository linked in the issue to obtain all the profiling data. I am still yet to profile the memory allocated for the process execution, will try to obtain that since I want to measure the hit an unbound cache may create.

Apr 05 '24 06:04 crazybolillo

Thank you for the analysis of what need to be cachec. Any reason not to use functools.cacheor functools.lru_cache instead of using a global ?

I really wanted to use something standard, however the problem I found with functools caches is that they take into consideration all function arguments while I thought it only made sense to cache the requested module because:

Search paths are lists so they need further processing to hash them
I don't think search paths should be cached? If I already know where the package is found, does it matter if you give me different search paths? This point I am not so sure about, but it was my logic.

Apr 15 '24 03:04 crazybolillo

I tried the approach but It does not seem to work, main reason being that find_spec uses lists, so wrapping the function to pass it a tuple in order to make it cachable fails.

We would also need to make *args hashable, but that one is tricky, since I don't think order matters for search paths, so if the search path order changes we would get different hashes and the cache wont be hit as much.

What do you think? Maybe I am missing something.

I think we could work a hack use lru_cache internals but the code would be probably more complicated than using our own cache :thinking:

Apr 16 '24 19:04 crazybolillo

I'm hesitant to add a global, because pylint/astroid can be used on massive codebase and we had issues with memory constantly increasing in the past. I don't remember all cache clear API where we need to add this global to myself, and would need some time to check it (Jacob probably remember better than me if it comes to it though). Using a global is not impossible but it's easy to make a mistake and create a memory leak. So using lru_cache to set a max number of value stored would be convenient / reassuring. I suppose another (less elegant than adding two decorators) approach would be to create a new cached subfunction inside the current function with the paths we want to cache specifically, what do you think ?

Apr 22 '24 11:04 Pierre-Sassoulas

I think I was able to find a solution that lets us use lru_cache without getting into too much decorators or sub functions. I have basically wrapped the arguments in a class so we can control its hash and ignore problematic arguments. The benchmarks still look good as well.

Without cache:

Time (mean ± σ):     58.254 s ±  0.470 s    [User: 53.538 s, System: 4.705 s]
Range (min … max):   57.863 s … 59.150 s    10 runs

With cache:

Time (mean ± σ):     52.131 s ±  0.191 s    [User: 49.164 s, System: 2.954 s]
Range (min … max):   51.910 s … 52.550 s    10 runs

What do you think?

Apr 22 '24 19:04 crazybolillo

Nuked AstroidManager as suggested. Just waiting on your thoughts about exceptions preventing not found packages from being cached.

Apr 29 '24 03:04 crazybolillo

Thanks, I didn't notice that ImportError is raised when nothing is found. Good point.

However, I still think we should add the path to the cache key even when modules were found.

The following is unexpected, and I think it's realistic given the upstream callers like ast_from_module_name(). Pathing is one of the most sensitive parts of astroid.

>>> from astroid.interpreter._import.spec import find_spec
>>> find_spec(['brain'], ['astroid'])
ModuleSpec(name='brain', type=<ModuleType.PKG_DIRECTORY: 3>, location='astroid/brain', origin=None, submodule_search_locations=None)
>>> find_spec(['brain'], ['pylint'])
ModuleSpec(name='brain', type=<ModuleType.PKG_DIRECTORY: 3>, location='astroid/brain', origin=None, submodule_search_locations=None)

https://github.com/pylint-dev/astroid/blob/7a3b482b9673243d2ccc895672eb1e452f5daa82/astroid/manager.py#L195

Apr 29 '24 11:04 jacobtylerwalls

For example, the path is part of the cache key for modules here:

https://github.com/pylint-dev/astroid/blob/7a3b482b9673243d2ccc895672eb1e452f5daa82/astroid/manager.py#L301-L305

Apr 29 '24 11:04 jacobtylerwalls

I made the search path part of its cache by converting it into a frozenset however there are lots of cache misses and the performance benefit gets nullified, so maybe this is harder than I thought :thinking: . This is the diff I tried (when compared with the current PR)

diff --git a/astroid/interpreter/_import/spec.py b/astroid/interpreter/_import/spec.py
index c812a612..62bb3e6c 100644
--- a/astroid/interpreter/_import/spec.py
+++ b/astroid/interpreter/_import/spec.py
@@ -428,6 +428,7 @@ def _find_spec_with_path(
 @dataclass(frozen=True)
 class SpecArgs:
     key: tuple
+    path_cache: frozenset | None
     modpath: list[str] = field(compare=False, hash=False)
     path: Sequence[str] | None = field(compare=False, hash=False)
 
@@ -449,7 +450,7 @@ def find_spec(modpath: list[str], path: Sequence[str] | None = None) -> ModuleSp
     :return: A module spec, which describes how the module was
              found and where.
     """
-    return _find_spec(SpecArgs(tuple(modpath), modpath, path))
+    return _find_spec(SpecArgs(tuple(modpath), frozenset(path) if path else None, modpath, path))

Here are the cache specs after running it on the test repo provided in the original issue:

No cache for search path: CacheInfo(hits=39800, misses=34979, maxsize=1024, currsize=1024) Caching search path: CacheInfo(hits=39755, misses=74619, maxsize=1024, currsize=1024)

Am I missing something? Maybe a better way to cache the path? Otherwise it seems like it may be better to optimize somewhere else.

Apr 30 '24 05:04 crazybolillo

Also since the main point of this PR (perf) may take a while, I can make a new PR to apply the nuking of the Astroid test class separately.

Apr 30 '24 05:04 crazybolillo

About the dataclasses, did you consider doing something like this for simplicity?

diff --git a/astroid/interpreter/_import/spec.py b/astroid/interpreter/_import/spec.py
index c812a612..65d7c88d 100644
--- a/astroid/interpreter/_import/spec.py
+++ b/astroid/interpreter/_import/spec.py
@@ -16,7 +16,6 @@ import types
 import warnings
 import zipimport
 from collections.abc import Iterator, Sequence
-from dataclasses import dataclass, field
 from functools import lru_cache
 from pathlib import Path
 from typing import Any, Literal, NamedTuple, Protocol
@@ -425,13 +424,6 @@ def _find_spec_with_path(
     raise ImportError(f"No module named {'.'.join(module_parts)}")
 
 
-@dataclass(frozen=True)
-class SpecArgs:
-    key: tuple
-    modpath: list[str] = field(compare=False, hash=False)
-    path: Sequence[str] | None = field(compare=False, hash=False)
-
-
 def find_spec(modpath: list[str], path: Sequence[str] | None = None) -> ModuleSpec:
     """Find a spec for the given module.
 
@@ -449,15 +441,14 @@ def find_spec(modpath: list[str], path: Sequence[str] | None = None) -> ModuleSp
     :return: A module spec, which describes how the module was
              found and where.
     """
-    return _find_spec(SpecArgs(tuple(modpath), modpath, path))
-
+    return _find_spec(tuple(modpath), tuple(path) if path else None)
 
 @lru_cache(maxsize=1024)
-def _find_spec(args: SpecArgs) -> ModuleSpec:
-    _path = args.path or sys.path
+def _find_spec(modpath: tuple[str], path: tuple[str] | None = None) -> ModuleSpec:
+    _path = path or sys.path
 
     # Need a copy for not mutating the argument.
-    modpath = args.modpath[:]
+    modpath = list(modpath)
 
     submodule_path = None
     module_parts = modpath[:]
@@ -466,7 +457,7 @@ def _find_spec(args: SpecArgs) -> ModuleSpec:
     while modpath:
         modname = modpath.pop(0)
         finder, spec = _find_spec_with_path(
-            _path, modname, module_parts, processed, submodule_path or args.path
+            _path, modname, module_parts, processed, submodule_path or path
         )
         processed.append(modname)
         if modpath:

Apr 30 '24 12:04 jacobtylerwalls

Am I missing something? Maybe a better way to cache the path? Otherwise it seems like it may be better to optimize somewhere else.

I'm wondering if the issue is in pylint. We're probably doing some unnecessary "is this import a relative import, if so provide the filepath we started from". It's providing all those filepaths that get checked first and strike out that are adding to the number of calls here.

This diff reduces a ton of calls, but it would need to be checked for correctness:

diff --git a/pylint/checkers/imports.py b/pylint/checkers/imports.py
index ac8962c50..361bd5571 100644
--- a/pylint/checkers/imports.py
+++ b/pylint/checkers/imports.py
@@ -1043,11 +1043,13 @@ class ImportsChecker(DeprecatedMixin, BaseChecker):
         module_file = node.root().file
         context_name = node.root().name
         base = os.path.splitext(os.path.basename(module_file))[0]
-
         try:
-            importedmodname = astroid.modutils.get_module_part(
-                importedmodname, module_file
-            )
+            if isinstance(node, nodes.ImportFrom):
+                importedmodname = astroid.modutils.get_module_part(
+                    importedmodname, module_file
+                )
+            else:
+                importedmodname = astroid.modutils.get_module_part(importedmodname)
         except ImportError:
             pass

Apr 30 '24 12:04 jacobtylerwalls

Yes, it seems like there are lots of calls for the same package with different paths which obliterates the cache. I collected all calls to find_spec with their arguments and grouped them by requested modpath. Here is an excerpt:

** ['jinja2'] 403
         - ['pylint-corpus/venv/lib/python3.11/site-packages/scalene']
         - ['.', 'pylint-corpus/venv/lib/python3.11/site-packages', 'pylint-corpus', 'pylint-corpus/venv/lib/python3.11/site-packages', 'pylint-corpus', '/usr/lib/python311.zip', '/usr/lib/python3.11', '/usr/lib/python3.11/lib-dynload', 'pylint-corpus/venv/lib/python3.11/site-packages', 'astroid']
         - ['pylint-corpus/venv/lib/python3.11/site-packages/jinja2']
         - None
         - ['pylint-corpus/venv/lib/python3.11/site-packages/pkg_resources/_vendor/pyparsing/diagram']
         - ['pylint-corpus/venv/lib/python3.11/site-packages/pip/_vendor/pyparsing/diagram']
         - ['pylint-corpus/venv/lib/python3.11/site-packages/setuptools/_vendor/pyparsing/diagram']

jinja2 was requested 403 times and those were the search paths used. I also wonder why jinja2 is looked for since its not used in the test repo.

I will investigate in pylint.

May 01 '24 18:05 crazybolillo

I followed the suggestion and applied a similar change in pylint. Cache specs do improve a lot. Also it seems like previous tests were running with a polluted virtualenv so that explains stuff like looking for jinja2 :sweat_smile:.

Run times diminished in all cases because of the clean virtualenv, however relatively speaking the performance increase seems even better than the first iteration. I also changed tuples to frozensets since apparently they are faster by a noticeable margin (according to hyperfine).

Stats

No Cache

Time (mean ± σ):     46.954 s ±  1.127 s    [User: 43.222 s, System: 3.700 s] 
Range (min … max):   45.499 s … 48.479 s    10 runs

Cache

tuple

Time (mean ± σ):     38.001 s ±  0.359 s    [User: 35.533 s, System: 2.452 s]
Range (min … max):   37.388 s … 38.509 s    10 runs

frozenset

Time (mean ± σ):     35.961 s ±  0.758 s    [User: 33.585 s, System: 2.337 s]
Range (min … max):   34.297 s … 36.718 s    10 runs

The related PR in pylint that would make this cache work is pylint#9595.

May 04 '24 18:05 crazybolillo

Nevermind about the frozenset, when converting it back to a list ordering is sometimes lost so it can't be used, at least not for the sensitive modpath, for the search paths I don't think the order matters too much (?) so we can use frozensets there.

May 04 '24 18:05 crazybolillo

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 92.78%. Comparing base (7a3b482) to head (82c839e). Report is 11 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2408      +/-   ##
==========================================
+ Coverage   92.76%   92.78%   +0.02%     
==========================================
  Files          94       94              
  Lines       11087    11095       +8     
==========================================
+ Hits        10285    10295      +10     
+ Misses        802      800       -2

Flag	Coverage Δ
linux	`92.60% <100.00%> (+0.02%)`	:arrow_up:
pypy	`92.78% <100.00%> (+2.01%)`	:arrow_up:
windows	`92.69% <100.00%> (+0.32%)`	:arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Files	Coverage Δ
astroid/interpreter/_import/spec.py	`97.44% <100.00%> (+0.05%)`	:arrow_up:
astroid/manager.py	`89.58% <100.00%> (+0.04%)`	:arrow_up:

... and 2 files with indirect coverage changes

May 04 '24 18:05 codecov[bot]

The potential indeterminacy of the frozenset does scare me a little bit. Just this week on Discord someone was asking for pointers about debugging indeterminate behavior of the cyclic-import check, and I advised them to look into nondeterminacy in astroid. Leaving a frozenset for the path sounds like exactly that kind of behavior.

I'd suggest using tuple for both. But if you want to prepare a new benchmark and discuss it, that's fine too.

May 04 '24 19:05 jacobtylerwalls

the performance increase seems even better than the first iteration

Right, more than 20% faster is HUGE !

search paths I don't think the order matters too much (?)

Not sure if I understand this right but the order of discovery does matter, right ? Doesn't it affect https://github.com/pylint-dev/pylint/issues/6535 (pinned issue in pylint) ?

May 04 '24 20:05 Pierre-Sassoulas

Yeah, I traced what happens to that path argument, and it eventually ends up here, in an order-sensitive for loop in importlib, so I wouldn't want this to become unordered:

https://github.com/python/cpython/blob/978fba58aef347de4a1376e525df2dacc7b2fff3/Lib/importlib/_bootstrap_external.py#L1522

Side note: it seems like our constructor for Finder is actually useless? I can't find anywhere that uses _path. Could be a small gain to nuke it.

https://github.com/pylint-dev/astroid/blob/4a8827d7afe698d74cc1abfdbb6ef10de6e3d89f/astroid/interpreter/_import/spec.py#L85-L86

May 04 '24 20:05 jacobtylerwalls

Ooops , I changed the search paths to tuple as well. Perf is still good, in fact since most of the time we should be getting no search paths I ran hyperfine again and there was not much of an impact.

May 04 '24 20:05 crazybolillo

Thanks to everyone as well for the help and reviews :handshake:

May 04 '24 21:05 crazybolillo

astroid
astroid copied to clipboard

Improve performance by caching find_spec

Before PR

Timings using hyperfine

Syscalls

After PR

Timings using hyperfine

Syscalls

Stats

No Cache

Cache

tuple

frozenset

Codecov Report

astroid astroid copied to clipboard

Improve performance by caching find_spec

Before PR

Timings using hyperfine

Syscalls

After PR

Timings using hyperfine

Syscalls

Stats

No Cache

Cache

tuple

frozenset

Codecov Report

astroid
astroid copied to clipboard