mypy icon indicating copy to clipboard operation
mypy copied to clipboard

mypy is slow when type checking torch

Open hauntsaninja opened this issue 1 year ago • 4 comments

λ mypy --version          
mypy 1.11.2 (compiled: yes)

λ uv pip show torch       
Using Python 3.11.8 environment at /Users/shantanu/.virtualenvs/openai-wfht
Name: torch
Version: 2.1.0
Location: /Users/shantanu/.virtualenvs/openai-wfht/lib/python3.11/site-packages
Requires: filelock, fsspec, jinja2, networkx, sympy, typing-extensions
Required-by: ...

λ time mypy -c 'import torch' --no-incremental
Success: no issues found in 1 source file
mypy -c 'import torch' --no-incremental  33.09s user 2.73s system 98% cpu 36.391 total

λ time mypy -c 'import torch'
Success: no issues found in 1 source file
mypy -c 'import torch'  6.24s user 0.88s system 95% cpu 7.454 total

We use a lot of torch at work, performance is probably the biggest reason folks at work switch to a different type checker.

hauntsaninja avatar Oct 11 '24 04:10 hauntsaninja

If this is accurate, maybe the fscache exception handling is really slowing us down in the mypyc build.

mypyc: native

interpreted: interpreted

hauntsaninja avatar Oct 11 '24 04:10 hauntsaninja

mypy -v produces details about processed files, and this seems important:

LOG:  Processing SCC of size 945 (torch.onnx._globals torch._inductor.exc torch._inductor.runtime.hi
nts torch.utils._traceback torch.utils._sympy.functions ... <long output snipped>

Mypy detects an import cycle with 945 modules.

Overall 1380 files were parsed, so 68% of processed files are in this one SCC. I've seen this pattern in other third-party packages as well -- the majority of the implementation is a single SCC.

A potential way to make the SCC smaller would be to process imports lazily in third-party modules (where this is possible, since errors aren't reported). It may be tricky to implement though, but I'll think about it more.

JukkaL avatar Oct 11 '24 09:10 JukkaL

Yeah, lazy import resolution could be a massive perf win

hauntsaninja avatar Oct 11 '24 09:10 hauntsaninja

https://github.com/python/mypy/issues/17924 is the issue for tracking lazy resolution

Jukka's times in https://github.com/python/mypy/pull/17920#issuecomment-2406966926 are much better than mine. https://github.com/python/mypy/issues/17948 is the issue for tracking performance improvements in my work environment.

hauntsaninja avatar Oct 15 '24 06:10 hauntsaninja

Performance is now a lot better, but I bet there are still some good opportunities to make this faster. Fresh CPU profiles would be interesting to see.

JukkaL avatar Oct 28 '24 16:10 JukkaL

Here's a new profile for 53134979c !

Install torch, along with a few extra dependencies:

rm -rf torchenv
python -m venv torchenv
uv pip install --python torchenv/bin/python torch matplotlib onnx optree types-redis --exclude-newer 2024-10-29

Then I get the following on Python 3.11:

λ hyperfine -w 1 -M 3 '/tmp/mypy_primer/timer_mypy_53134979c/venv/bin/mypy -c "import torch" --python-executable=torchenv/bin/python --no-incremental'
Benchmark 1: /tmp/mypy_primer/timer_mypy_53134979c/venv/bin/mypy -c "import torch" --python-executable=torchenv/bin/python --no-incremental
  Time (mean ± σ):     27.210 s ±  0.194 s    [User: 25.506 s, System: 1.684 s]
  Range (min … max):   27.052 s … 27.426 s    3 run

Here's the output of:

py-spy record --native -- /tmp/mypy_primer/timer_mypy_53134979c/venv/bin/python -m mypy -c "import torch" --no-incremental --python-executable torchenv/bin/python

Flamegraph

(I realised py-spy also supports --format speedscope which can be nicer, but is harder to just link on Github)

hauntsaninja avatar Oct 29 '24 01:10 hauntsaninja

@hauntsaninja I've merged some additional optimizations. It would be interesting to see if the numbers have improved.

JukkaL avatar Dec 19 '24 18:12 JukkaL

They have indeed improved!

With this env on Python 3.11:

rm -rf torchenv
python -m venv torchenv
uv pip install --python torchenv/bin/python torch matplotlib onnx optree types-redis --exclude-newer 2024-10-29

Running the following:

export PYTHON="/tmp/mypy_primer/timer_mypy_$COMMIT/venv/bin/python"
$PYTHON -m pip install orjson
$PYTHON -m mypy --version
hyperfine -w 1 -M 5 "$PYTHON -m mypy -c 'import torch' --python-executable torchenv/bin/python"
hyperfine -w 2 -M 5 "$PYTHON -m mypy -c 'import torch' --python-executable torchenv/bin/python --no-incremental"

I get:

Benchmark 1: /tmp/mypy_primer/timer_mypy_eb310343/venv/bin/python -m mypy -c 'import torch' --python-executable torchenv/bin/python
  Time (mean ± σ):      3.151 s ±  0.163 s    [User: 2.593 s, System: 0.556 s]
  Range (min … max):    3.008 s …  3.409 s    5 runs
 
hyperfine -w 1 -M 5   38.61s user 4.73s system 100% cpu 43.331 total
Benchmark 1: /tmp/mypy_primer/timer_mypy_eb310343/venv/bin/python -m mypy -c 'import torch' --python-executable torchenv/bin/python --no-incremental
  Time (mean ± σ):     27.366 s ±  0.579 s    [User: 25.290 s, System: 2.052 s]
  Range (min … max):   26.552 s … 28.128 s    5 runs
mypy 1.15.0+dev.d33cef8396c456d87db16dce3525ebf431f4b57f (compiled: yes)
Benchmark 1: /tmp/mypy_primer/timer_mypy_d33cef83/venv/bin/python -m mypy -c 'import torch' --python-executable torchenv/bin/python
  Time (mean ± σ):      2.473 s ±  0.038 s    [User: 1.966 s, System: 0.505 s]
  Range (min … max):    2.443 s …  2.538 s    5 runs
 
hyperfine -w 1 -M 5   33.29s user 4.25s system 100% cpu 37.536 total
Benchmark 1: /tmp/mypy_primer/timer_mypy_d33cef83/venv/bin/python -m mypy -c 'import torch' --python-executable torchenv/bin/python --no-incremental
  Time (mean ± σ):     25.583 s ±  0.375 s    [User: 23.681 s, System: 1.884 s]
  Range (min … max):   25.091 s … 26.134 s    5 runs

So latest master is 1.27x faster on incremental and 1.07x faster on non-incremental compared to 1.13

hauntsaninja avatar Dec 20 '24 09:12 hauntsaninja

On the latest master, if you use --fixed-format-cache that was recently added by @ilevkivskyi, warm runs (with cache generated) are significantly faster that before. #19681 also helped with cache deserialization speed. I hope to work on #17924 to further improve performance in incremental mode.

JukkaL avatar Aug 21 '25 17:08 JukkaL