Add additional GC related benchmark
Given this performance regression in Python 3.14, it would be nice if we had a benchmark which would have more clearly shown it before final release. I have a new GC specific benchmark but it doesn't do things that would show this regression.
We should consider adding a new benchmark. The key features would be:
- creating a bunch of net new container objects, in order to trigger many, potentially young generation, collections
- creating tuple objects that can be untracked by the GC, in order to avoid slowing down full collections
Ideally we prefer to avoid a micro or synthetic benchmark and instead find some kind of real application that shows this regression. E.g. near the 3.13 release, we found a GC regression shown by Sphinx building Python docs. Something like that would be good.
It seems that @pgdr has such application https://github.com/pgdr/regressionquery. However, I'm not sure if it's suitable enough for pyperformance purposes.
Yes, I have a real-world application for this, called seglines, that computes segmented least squares (or segmented regression).
However, I have now tried to reproduce the slowdown in Python 3.14 with this application for 8 hours without success, and I think I give up.
I have a very simple test case that shows the slowdown (with pyperf) that I can contribute, but it's not a "realistic" or "real world application" (directly, at least).
The benchmark would be the following:
"""The background for this benchmark is that the garbage collection in Python 3.14
had a performance regression, see
https://github.com/python/cpython/issues/139951.
"""
import pyperf
def test(N):
d = {}
for i in range(N):
d[(i, i)] = i
if __name__ == "__main__":
runner = pyperf.Runner()
N = 1_000_000
runner.metadata["description"] = "Dict-Tuple GC"
runner.bench_func("dict_tuple_gc", test, N)
An running it:
Python 3.13.3
dict_tuple_gc: Mean +- std dev: 219 ms +- 41 ms
Python 3.13.9
dict_tuple_gc: Mean +- std dev: 215 ms +- 42 ms
Python 3.14.0
dict_tuple_gc: Mean +- std dev: 844 ms +- 31 ms
Python 3.15.0a1+
dict_tuple_gc: Mean +- std dev: 258 ms +- 15 ms
If it has even the slightest chance of being merged, I can make the PR and we/you can discuss there.
A benchmark based on pickletools might be good. It seems to show this regression.
@nascheme PR added