pyperformance icon indicating copy to clipboard operation
pyperformance copied to clipboard

Add additional GC related benchmark

Open nascheme opened this issue 2 months ago • 4 comments

Given this performance regression in Python 3.14, it would be nice if we had a benchmark which would have more clearly shown it before final release. I have a new GC specific benchmark but it doesn't do things that would show this regression.

We should consider adding a new benchmark. The key features would be:

  • creating a bunch of net new container objects, in order to trigger many, potentially young generation, collections
  • creating tuple objects that can be untracked by the GC, in order to avoid slowing down full collections

Ideally we prefer to avoid a micro or synthetic benchmark and instead find some kind of real application that shows this regression. E.g. near the 3.13 release, we found a GC regression shown by Sphinx building Python docs. Something like that would be good.

nascheme avatar Oct 28 '25 18:10 nascheme

It seems that @pgdr has such application https://github.com/pgdr/regressionquery. However, I'm not sure if it's suitable enough for pyperformance purposes.

sergey-miryanov avatar Oct 28 '25 19:10 sergey-miryanov

Yes, I have a real-world application for this, called seglines, that computes segmented least squares (or segmented regression).

However, I have now tried to reproduce the slowdown in Python 3.14 with this application for 8 hours without success, and I think I give up.

I have a very simple test case that shows the slowdown (with pyperf) that I can contribute, but it's not a "realistic" or "real world application" (directly, at least).

The benchmark would be the following:

"""The background for this benchmark is that the garbage collection in Python 3.14
had a performance regression, see
https://github.com/python/cpython/issues/139951.
"""
import pyperf

def test(N):
    d = {}
    for i in range(N):
        d[(i, i)] = i

if __name__ == "__main__":
    runner = pyperf.Runner()
    N = 1_000_000
    runner.metadata["description"] = "Dict-Tuple GC"
    runner.bench_func("dict_tuple_gc", test, N)

An running it:

Python 3.13.3
dict_tuple_gc: Mean +- std dev: 219 ms +- 41 ms

Python 3.13.9
dict_tuple_gc: Mean +- std dev: 215 ms +- 42 ms

Python 3.14.0
dict_tuple_gc: Mean +- std dev: 844 ms +- 31 ms

Python 3.15.0a1+
dict_tuple_gc: Mean +- std dev: 258 ms +- 15 ms

If it has even the slightest chance of being merged, I can make the PR and we/you can discuss there.

pgdr avatar Oct 28 '25 21:10 pgdr

A benchmark based on pickletools might be good. It seems to show this regression.

nascheme avatar Oct 29 '25 16:10 nascheme

@nascheme PR added

pgdr avatar Nov 03 '25 18:11 pgdr