pyperformance Add additional GC related benchmark

Given this performance regression in Python 3.14, it would be nice if we had a benchmark which would have more clearly shown it before final release. I have a new GC specific benchmark but it doesn't do things that would show this regression.

We should consider adding a new benchmark. The key features would be:

creating a bunch of net new container objects, in order to trigger many, potentially young generation, collections
creating tuple objects that can be untracked by the GC, in order to avoid slowing down full collections

Ideally we prefer to avoid a micro or synthetic benchmark and instead find some kind of real application that shows this regression. E.g. near the 3.13 release, we found a GC regression shown by Sphinx building Python docs. Something like that would be good.

Oct 28 '25 18:10 nascheme

It seems that @pgdr has such application https://github.com/pgdr/regressionquery. However, I'm not sure if it's suitable enough for pyperformance purposes.

Oct 28 '25 19:10 sergey-miryanov

Yes, I have a real-world application for this, called seglines, that computes segmented least squares (or segmented regression).

However, I have now tried to reproduce the slowdown in Python 3.14 with this application for 8 hours without success, and I think I give up.

I have a very simple test case that shows the slowdown (with pyperf) that I can contribute, but it's not a "realistic" or "real world application" (directly, at least).

The benchmark would be the following:

"""The background for this benchmark is that the garbage collection in Python 3.14
had a performance regression, see
https://github.com/python/cpython/issues/139951.
"""
import pyperf

def test(N):
    d = {}
    for i in range(N):
        d[(i, i)] = i

if __name__ == "__main__":
    runner = pyperf.Runner()
    N = 1_000_000
    runner.metadata["description"] = "Dict-Tuple GC"
    runner.bench_func("dict_tuple_gc", test, N)

An running it:

Python 3.13.3
dict_tuple_gc: Mean +- std dev: 219 ms +- 41 ms

Python 3.13.9
dict_tuple_gc: Mean +- std dev: 215 ms +- 42 ms

Python 3.14.0
dict_tuple_gc: Mean +- std dev: 844 ms +- 31 ms

Python 3.15.0a1+
dict_tuple_gc: Mean +- std dev: 258 ms +- 15 ms

If it has even the slightest chance of being merged, I can make the PR and we/you can discuss there.

Oct 28 '25 21:10 pgdr

A benchmark based on pickletools might be good. It seems to show this regression.

Oct 29 '25 16:10 nascheme

@nascheme PR added

Nov 03 '25 18:11 pgdr