cpython icon indicating copy to clipboard operation
cpython copied to clipboard

gh-143445: Optimize deepcopy for 1.04x speedup

Open heikkitoivonen opened this issue 1 week ago • 2 comments

Gains according to pyperformance:

deepcopy:
Mean +- std dev: 411 us +- 2 us -> 396 us +- 3 us: 1.04x faster
Significant (t=28.94)

deepcopy_reduce:
Mean +- std dev: 4.38 us +- 0.05 us -> 4.23 us +- 0.04 us: 1.04x faster
Significant (t=20.05)
  • Issue: gh-143445

heikkitoivonen avatar Jan 05 '26 20:01 heikkitoivonen

I am not too worried about additional memory usage. How long could args be? I suspect typical cases < 5, with maybe some extreme cases < 100. The change is small and this change has been used in other python packages for performance reasons as well.

The performance of generator expressions could change in the future (FT, JIT), but I suspect the list comprehensions will be a bit faster for some time (no hard arguments here, just a feeling).

Here is a microbenchmark of the relevant part:

import copy
import timeit

args=(1,[], {})

memo = {}

def f(*args):
    return None
def l():
    z = [copy.deepcopy(a, memo) for a in args]
    return f(*z)
def h():
    z = (copy.deepcopy(a, memo) for a in args)
    return f(*z)
    
t1=(timeit.timeit('l()', globals=globals()))
t2=(timeit.timeit('h()', globals=globals()))

print(f'{t1=:.3f} {t2=:.3f} ratio {t1/t2:.3f}')

On my system the list comprehension is about 25-30% faster (both normal and FT build).

Update: the microbenchmark does not clear the memo. By setting memo=None instead performance gain is in the 10-12% range.

eendebakpt avatar Jan 05 '26 22:01 eendebakpt

This will increase the memory consumption of deepcopy any may not necesasrily be desirable. Do you have some numbers for that as well?

Memory consumption seems mixed.

On my computer genexpr uses 200 bytes. The splat operation for the function call means that the generator runs to the end, and all of the items returned by the generator will be alive for the whole function call. And even after fully consuming the generator, it still takes the 200 bytes until we return from _reconstruct.

Whereas with the list, if the number args is small, then creating a list would actually take less space than the generator. On my computer a list starts from size 56, and grows to 186 for size 16, and then to 248 for 17. We also pay the cost of the copies of all the args from the creation of the list to the end of _reconstruct.

In the worst case there are lots of args, and they are large, in which case the new code would use more memory (due to large list taking more space than genexpr), and for a longer period of time (due to the copies staying alive in the list).

heikkitoivonen avatar Jan 06 '26 00:01 heikkitoivonen

I compiled with --enable-experimental-jit=yes and see similar effect (just a quick run for now, results for the first test are not stable according to pyperformance).

deepcopy_jit_baseline.json
==========================

Performance version: 1.13.0
Python version: 3.15.0a3+ (64-bit) revision ef2d3ea5bba
Report on macOS-14.6.1-arm64-arm-64bit-Mach-O
Number of logical CPUs: 8
Start date: 2026-01-05 22:53:05.117887
End date: 2026-01-05 22:53:53.928093

deepcopy_jit_optimized.json
===========================

Performance version: 1.13.0
Python version: 3.15.0a3+ (64-bit) revision ef2d3ea5bba
Report on macOS-14.6.1-arm64-arm-64bit-Mach-O
Number of logical CPUs: 8
Start date: 2026-01-05 22:54:20.384958
End date: 2026-01-05 22:55:08.471377

### deepcopy ###
Mean +- std dev: 722 us +- 8 us -> 702 us +- 10 us: 1.03x faster
Significant (t=11.77)

### deepcopy_memo ###
Mean +- std dev: 71.7 us +- 1.1 us -> 71.8 us +- 1.1 us: 1.00x slower
Not significant

### deepcopy_reduce ###
Mean +- std dev: 5.09 us +- 0.06 us -> 4.94 us +- 0.04 us: 1.03x faster
Significant (t=15.51)

heikkitoivonen avatar Jan 06 '26 07:01 heikkitoivonen

Merged, thanks for your nice optimization!

vstinner avatar Jan 08 '26 15:01 vstinner