[ty] Speedup ty-walltime benchmarks
Summary
Reduce the time-to-completion for walltime benchmarks from ~15min to ~9min by (should be even faster once the rust caching kicks in):
- Increase the sharding from 2 to 4 jobs
- Manual selection of the tests per shard based on their walltime
- Use a depot runner to build the benchmarks to reduce our codspeed cost and for faster queue and build times (from ~6min to 3min when using a 4-core machine). We could use a larger runner, but I don't think this is necessary, once the third-party dependencies are cached.
I also had to rename the benchmarks because codspeed seems to struggle if benchmarks from different groups run on different shards. But I think that's for the better anyway:
Remove the small, medium and large groups because projects that used to be very fast to type check now take longer and would have to be moved into another group so that we can update the iteration counts.
This PR also updates the iteration counts for colour_science and pandas (from 3 to 2, equal to moving them to large) and pydantic (from 1 to 3, moving it from large to medium).
The downside of this is that we lose our historical data but this is a better long-term setup.
| Benchmark | Time/iter | Iters | Total | Shard |
|---|---|---|---|---|
| colour_science | 1.46 min | 2 | ~2.9 min | 1 |
| pandas | 1.04 min | 2 | ~2.1 min | 2 |
| tanjun | 2.5s | 6 | ~15s | 2 |
| altair | 5.1s | 6 | ~31s | 2 |
| static_frame | 20s | 3 | ~1 min | 3 |
| sympy | 51s | 2 | ~1.7 min | 3 |
| pydantic | 10.6s | 6 | ~64s | 4 |
| multithreaded | 1.4s | 24 | ~34s | 4 |
| freqtrade | 8s | 6 | ~48s | 4 |
| Shard | Benchmarks | Total Time |
|---|---|---|
| 1 | colour_science |
~2.9 min |
| 2 | pandas|tanjun|altair |
~2.9 min |
| 3 | static_frame|sympy |
~2.7 min |
| 4 | pydantic|multithreaded|freqtrade |
~2.4 min |
TLDR: The benchmarks now often complete before the ty-instrumented benchmarks