pyperformance Improving representative benchmarks for typing ecosystem

Due to a current lack of representative macrobenchmarks, it is very difficult to decide on whether complex accelerators for some parts of typing are worth implementing in the future. Hence, I'm trying to upstream some benchmarks into pyperformance.

IMO, there are three main areas:

Performance of static type checkers implemented in Python (e.g. mypy). (Fixed by #102)
Performance of programs using types at runtime (e.g. pydantic, attrs, etc.).
Runtime overhead of typed code vs fully untyped code.

For case 2, I plan to use one of pydantic's benchmarks here https://github.com/samuelcolvin/pydantic/tree/master/benchmarks, installed without compiled binaries.

Case 3 is very tricky because there are so many ways to use typing. I don't know how often people use certain features, whether they type-hint inside tight loops, etc. So I'm struggling to find a good benchmark. An idea: grabbing one of the existing pyperformance benchmarks, fully type-hinting it, then comparing the performance delta may work.

CC @JelleZijlstra, I would greatly appreciate hearing your opinion on this (especially for case 3). Maybe I can post this on typing-sig too if I need more help.

Afterword: All 3 cases benefit from general CPython optimizations. But usually only 3. benefits greatly from typing module-only optimizations (with 1. maybe not improving much if at all, depending on implementation).

Jul 31 '21 02:07 Fidget-Spinner

Here are some thoughts:

mypy itself is actually also a good example of a fully typed Python codebase that's friendly to benchmarking. Comparing its performance with all types stripped out (but still in pure Python mode, no mypyc) could be interesting.
I think there's a lot of variation in how people use typing. For example, my company's codebase very heavily uses NewTypes as annotations, but barely uses generics. But mypy doesn't use NewTypes internally but uses a lot of generics.
Here are some aspects of static typing that could plausibly affect runtime performance
- Import time cost (in both speed and memory) of evaluating lots of annotations. This was a major part of the motivation for PEP 563 and PEP 649. The latter is still under consideration by the SC, so benchmarks could help inform a decision.
- Instantiation of generic classes. I remember this was especially an issue early on but I think Ivan (?) made some fixes later in the 3.x series. (So class X(Generic[T]): ... made X() slow.) This would be a good benchmark to add.
- cast() and NewType(), two of the few parts of typing you'd actually execute at runtime. These are identity functions though so there's not too much to optimize other than implementing them in C. I guess we could implement typing.cast in C now that we did the same for NewType.__call__. cast() is fairly common in mypy's codebase.

Jul 31 '21 02:07 JelleZijlstra

Thanks! I realized a few holes in my own ideas and your comments gave me a lot of food for thought.

mypy itself is actually also a good example of a fully typed Python codebase that's friendly to benchmarking. Comparing its performance with all types stripped out (but still in pure Python mode, no mypyc) could be interesting.

I contemplated stripping all annotations from mypy, but I'm unsure of how to get rid of the non-annotation stuff too (like cast, NewType, Protocol, etc.). My goal is to bench a clean vanilla program, vs fully type-hinted code (annotations + other typing things that won't improve much due to PEP 563, 649), so that we can have a good guide on how much slower a piece of code with thorough typing will run.

Jul 31 '21 03:07 Fidget-Spinner

I'm unsure of how to get rid of the non-annotation stuff too (like cast, NewType, Protocol, etc.).

I haven't tried this, but it may be feasible to do this with something like a LibCST codemod: cast(a, b) gets replaced with b, for example.

One area you wouldn't be able to get rid of would be isinstance checks on @runtime_checkable protocols. That's another good area to benchmark.

so that we can have a good guide on how much slower a piece of code with thorough typing will run

I'm not sure that's really a realistic idea. At runtime typing mostly does nothing, so the performance effect of adding types is really going to depend on what pieces of typing you use.

Jul 31 '21 03:07 JelleZijlstra

FYI, I've merged my PR that allows running benchmarks that aren't part of pyperformance. I also have a PR up against the Pyston benchmarks repo to allow them to be run using pyperformance: https://github.com/pyston/python-macrobenchmarks/pull/3.

Dec 09 '21 18:12 ericsnowcurrently

pyperformance pyperformance copied to clipboard

Improving representative benchmarks for typing ecosystem

pyperformance
pyperformance copied to clipboard