typeshed
typeshed copied to clipboard
Performance regression in `regr_test.py`
I looks like running regr_test.py on Python 3.12 in CI, rather than Python 3.11, makes the job take 20-30s longer. It used to take between 3m and 3m10s; now it takes between 3m30s and 3m40s. That's actually pretty noticeable on PRs like #11585, #11587 or https://github.com/python/typeshed/pull/11589, where we don't run mypy_primer, meaning that regr_test.py is the slowest job in CI for those PRs.
It seems good generally for us to run this job on the latest version of Python, so I'm not necessarily suggesting that we should go back to running it on Python 3.11. But I'm curious if anybody has any ideas as to why bumping the Python version would make the job slower in CI. (I've experimented on my local fork with changing it back to using 3.11, and it fixes the performance regression -- it was definitely caused by #11544, unfortunately.)
Here's the first run that used Python 3.12: https://github.com/python/typeshed/actions/runs/8206228752/job/22444848542
And here's the one immediately before that, which used Python 3.11: https://github.com/python/typeshed/actions/runs/8206219736/job/22444819614?pr=11524
I see something similar if I try this locally on linux:
$ time python3.11 ./tests/regr_test.py --all --verbosity QUIET
Test completed successfully!
real 1m0,876s
user 5m49,798s
sys 0m24,210s
$ time python3.12 ./tests/regr_test.py --all --verbosity QUIET
Test completed successfully!
real 1m9,412s
user 6m56,667s
sys 0m25,051s
I emptied the mypy cache beforehand each time. Is this some issue with mypy and Python 3.12?
I'm fine with using 3.11 in CI for now, although we should add a comment explaining the situation.
I haven't investigated really yet (and probably won't have time until this evening at the earliest), but here's a few hypotheses I can think of:
- Some stdlib function we're using a lot in the test runs much slower on py312
- Mypy runs much slower on py312
uvruns much slower on py312- Say we have regression tests for our package
types-foo, andtypes-foohas a non-types dependency onbar. We dynamically install those non-types dependencies into a venv before running the regression tests fortypes-foo. Maybebar(or a dependency ofbar) doesn't provide py312 wheels yet, meaning we're doing something silly like building a C extension from source?
It would be good to get to the bottom of things anyway -- if it's mypy, CPython or uv to blame, I'd be interested in letting them know.
On my machine there's about a 10s difference consistently. I didn't think to clear mypy cache so take this with a grain of salt (and I've got other things to do today). But hopefully this offers some pointers.
Some mypy subprocess runs can differ up to 2s, mostly slower on Python 3.12, but not always, so it could average out over the 255 subprocess calls (so I'll refrain from posting my mypy subprocesses timing results for now)
Note that the install command with uv has a clear performance difference though. At least on my machine. (as Alex mentionned, could be caused by the installed library's Python 3.12 version, not necessarily a problem in vu itself)
Python 3.11
Checking uv subprocess commands with perf_counter
Execution time of ['uv', 'pip', 'install', 'mypy==1.9.0', 'MarkupSafe', '--no-cache-dir']: 11.079815700002655
Execution time of ['uv', 'pip', 'install', 'mypy==1.9.0', 'cryptography>=35.0.0', '--no-cache-dir']: 12.484998500000074
Execution time of ['uv', 'pip', 'install', 'mypy==1.9.0', 'numpy', '--no-cache-dir']: 13.562204699999711
Execution time of ['uv', 'pip', 'install', 'mypy==1.9.0', 'urllib3>=2', '--no-cache-dir']: 10.540249300000141
Execution time of uv venv \.venv: 0.04944420000174432
Execution time of uv venv \.venv: 0.05479460000060499
Execution time of uv venv \.venv: 0.09991710000031162
Execution time of uv venv \.venv: 0.15117509999981849
Some preliminary cprofile results
regr_test_py311.prof% sort tottime
regr_test_py311.prof% stats 20
Wed Mar 13 16:28:21 2024 regr_test_py311.prof
243748 function calls (238315 primitive calls) in 41.590 seconds
Ordered by: internal time
List reduced from 1107 to 20 due to restriction <20>
ncalls tottime percall cumtime percall filename:lineno(function)
146 33.103 0.227 33.103 0.227 {method 'acquire' of '_thread.lock' objects}
23494 7.498 0.000 7.498 0.000 {built-in method nt.unlink}
2155 0.417 0.000 0.417 0.000 {built-in method nt.rmdir}
2155/17 0.171 0.000 8.302 0.488 shutil.py:598(_rmtree_unsafe)
2155 0.098 0.000 0.098 0.000 {built-in method nt.scandir}
25631 0.062 0.000 0.062 0.000 {method 'stat' of 'nt.DirEntry' objects}
25631 0.049 0.000 0.118 0.000 shutil.py:570(_rmtree_isdir)
479 0.017 0.000 0.018 0.000 {built-in method nt.stat}
56 0.012 0.000 0.012 0.000 {built-in method io.open_code}
3 0.012 0.004 0.012 0.004 {built-in method _imp.create_dynamic}
407/55 0.011 0.000 0.033 0.001 _parser.py:509(_parse)
25647 0.007 0.000 0.007 0.000 {built-in method _stat.S_ISDIR}
11583 0.006 0.000 0.006 0.000 _parser.py:233(__next)
56 0.006 0.000 0.006 0.000 {built-in method marshal.loads}
810 0.005 0.000 0.008 0.000 <frozen importlib._bootstrap_external>:96(_path_join)
678/51 0.005 0.000 0.014 0.000 _compiler.py:37(_compile)
10667 0.005 0.000 0.011 0.000 _parser.py:254(get)
17 0.004 0.000 0.004 0.000 {built-in method nt.mkdir}
259 0.004 0.000 0.005 0.000 _compiler.py:241(_optimize_charset)
87/1 0.004 0.000 41.590 41.590 {built-in method builtins.exec}
Python 3.12
Checking uv subprocess ommands with perf_counter
Execution time of ['uv', 'pip', 'install', 'mypy==1.9.0', 'MarkupSafe', '--no-cache-dir']: 16.475093000000925
Execution time of ['uv', 'pip', 'install', 'mypy==1.9.0', 'cryptography>=35.0.0', '--no-cache-dir']: 16.486071899998933
Execution time of ['uv', 'pip', 'install', 'mypy==1.9.0', 'numpy', '--no-cache-dir']: 15.042250600003172
Execution time of ['uv', 'pip', 'install', 'mypy==1.9.0', 'urllib3>=2', '--no-cache-dir']: 9.922830200001044
Execution time of uv venv \.venv: 0.13397619999886956
Execution time of uv venv \.venv: 0.1747261000018625
Execution time of uv venv \.venv: 0.30994009999994887
Execution time of uv venv \.venv: 0.3153578999990714
Some preliminary cprofile results (look at those primitive calls!!!)
regr_test_py312.prof% strip
regr_test_py312.prof% sort tottime
regr_test_py312.prof% stats 20
Wed Mar 13 16:27:19 2024 regr_test_py312.prof
1377313 function calls (1316539 primitive calls) in 51.433 seconds
Ordered by: internal time
List reduced from 1284 to 20 due to restriction <20>
ncalls tottime percall cumtime percall filename:lineno(function)
120/44 17.097 0.142 3.751 0.085 {method 'get' of '_queue.SimpleQueue' objects}
17229/109 15.126 0.001 0.478 0.004 {built-in method _winapi.CopyFile2}
23494 8.901 0.000 8.901 0.000 {built-in method nt.unlink}
42/2 3.862 0.092 0.000 0.000 {method 'read' of '_io.TextIOWrapper' objects}
17229/14114 1.100 0.000 1.033 0.000 {built-in method nt._path_isdir}
1624/1388 0.605 0.000 0.530 0.000 {built-in method nt.utime}
2155 0.523 0.000 0.523 0.000 {built-in method nt.rmdir}
1673/1265 0.400 0.000 0.355 0.000 {built-in method nt.mkdir}
39702 0.369 0.000 0.866 0.000 <frozen ntpath>:107(join)
88133 0.264 0.000 0.390 0.000 <frozen ntpath>:179(splitroot)
1624/1357 0.251 0.000 0.243 0.000 {built-in method nt.chmod}
1624/2 0.235 0.000 0.334 0.167 shutil.py:479(_copytree)
63/1 0.196 0.003 41.568 41.568 threading.py:995(_bootstrap)
3859/3688 0.187 0.000 0.210 0.000 {built-in method nt.scandir}
17229/101 0.182 0.000 1.182 0.012 shutil.py:427(copy2)
2155/17 0.153 0.000 9.699 0.571 shutil.py:607(_rmtree_unsafe)
25 0.142 0.006 0.142 0.006 {built-in method _winapi.WaitForSingleObject}
171649/171303 0.116 0.000 0.150 0.000 {built-in method nt.fspath}
1652/1497 0.113 0.000 0.110 0.000 {built-in method nt._path_exists}
225912 0.100 0.000 0.101 0.000 {built-in method builtins.isinstance}