cpython icon indicating copy to clipboard operation
cpython copied to clipboard

gh-135552: Clear weakrefs to types in GC after garbage finalization not before

Open sergey-miryanov opened this issue 5 months ago • 15 comments

While trying to find the root cause of the linked issue, I observed that the type cache was holding an obsolete reference. Because the type cache holds borrowed references, this leads to a use-after-free error.

I'm resetting the cache in this PR. But I suspect that this may affect overall performance.

  • Issue: gh-135552

sergey-miryanov avatar Jun 19 '25 19:06 sergey-miryanov

@ZeroIntensity @sobolevn @Eclips4 @efimov-mikhail @fxeqxmulfx Please take a look.

sergey-miryanov avatar Jun 19 '25 19:06 sergey-miryanov

Would you mind adding a benchmark for this commit by comment?

Zheaoli avatar Jun 20 '25 06:06 Zheaoli

@sobolevn Thanks for the review! I will do soon. However, as with all simple solutions, this one doesn't seem to be the right one. I'm digging further.

@Zheaoli Yes, I'm planning to do perf benchmarks.

sergey-miryanov avatar Jun 20 '25 10:06 sergey-miryanov

However, as with all simple solutions, this one doesn't seem to be the right one. I'm digging further.

Yeah, as I said, this seems to be a problem with the GIL-ful garbage collector, not the object model. Something like this would also cause problems on the free-threaded build, but FT is working perfectly.

ZeroIntensity avatar Jun 20 '25 13:06 ZeroIntensity

@ZeroIntensity The root cause of this error is the absence of subclasses for BaseNode class. Therefore, the STORE_ATTR (->PyObject_SetAttr->tp_setattro->type_update_dict->type_modified_unlocked) can't reset the cache for subclasses. As a result, the Node type cache contains obsolete entries.

I have the following code, and I believe the problem is common to both:

class XXX:
    def __init__(self):
        self.x = weakref.ref(self)
        self.z = self

    def __del__(self):
        assert self.x() == self, (self.x(), self.z, self)

XXX()
XXX()

➜ .\python.bat -X faulthandler x.py 
Exception ignored while calling deallocator <function XXX.__del__ at 0x000001E0085F5010>:
Traceback (most recent call last):
  File "D:\Sources\_pythonish\cpython\main\x.py", line 29, in __del__
    assert self.x() == self, (self.x(), self.z, self)
AssertionError: (None, <__main__.XXX object at 0x000001E00838B760>, <__main__.XXX object at 0x000001E00838B760>)
Exception ignored while calling deallocator <function XXX.__del__ at 0x000001E0085F5010>:
Traceback (most recent call last):
  File "D:\Sources\_pythonish\cpython\main\x.py", line 29, in __del__
    assert self.x() == self, (self.x(), self.z, self)
AssertionError: (None, <__main__.XXX object at 0x000001E00838B8C0>, <__main__.XXX object at 0x000001E00838B8C0>)

As you can see, the weakref is None, but self has not been destroyed yet. I believe the garbage collector has a mechanism to handle weakrefs, but it runs too early, before _Py_Finalize (where tp_finalize is called).

sergey-miryanov avatar Jun 20 '25 15:06 sergey-miryanov

The root cause of this error is the absence of subclasses for BaseNode class. Therefore, the STORE_ATTR (->PyObject_SetAttr->tp_setattro->type_update_dict->type_modified_unlocked) can't reset the cache for subclasses. As a result, the Node type cache contains obsolete entries.

Ah, and there's no type cache under free-threading. Interesting!

ZeroIntensity avatar Jun 20 '25 15:06 ZeroIntensity

This fails on gc.collect and is not tied to interpreter finalization.

import gc

def x():

    class BaseNode:
        next = None


    class Node(BaseNode):
        def __init__(self) -> None:
            if BaseNode.next is None:
                BaseNode.next = self
            else:
                BaseNode.next.next = self

        def __del__(self) -> None:
            print(BaseNode.__subclasses__())
            BaseNode.next = BaseNode.next.next


    Node()
    Node()

x()
gc.collect()

➜ .\python.bat -X faulthandler x.py 
Running Debug|x64 interpreter...
[]
[]
Windows fatal exception: access violation

Current thread 0x000127b0 (most recent call first):
  Garbage-collecting
  File "D:\Sources\_pythonish\cpython\main\x.py", line 18 in __del__
  File "D:\Sources\_pythonish\cpython\main\x.py", line 25 in <module>

Current thread's C stack trace (most recent call first):
  <cannot get C stack on this system>

The call to gc.collect without arguments do a full collection and handle_weakrefs is called in this case. handle_weakrefs clears weakrefs and calls any callbacks if they exist. And moves some refs to the reachable list.

sergey-miryanov avatar Jun 21 '25 19:06 sergey-miryanov

In this variant I have crash even with proposed fix:

import gc

def x():
    class BaseNode:
        next = None

    class Node(BaseNode):
        def __init__(self) -> None:
            if BaseNode.next is None:
                BaseNode.next = self
            else:
                BaseNode.next.next = self

        def __del__(self) -> None:
            BaseNode.next.next
            BaseNode.next = self
            BaseNode.next.next

    Node()
    Node()

x()
print("Before GC")
gc.collect()
print("No segfault")

efimov-mikhail avatar Jun 22 '25 14:06 efimov-mikhail

@efimov-mikhail Yeah, thanks! This is great example that proposed solution is wrong.

sergey-miryanov avatar Jun 22 '25 18:06 sergey-miryanov

All commit authors signed the Contributor License Agreement.

CLA signed

python-cla-bot[bot] avatar Jun 22 '25 19:06 python-cla-bot[bot]

On my local machine I have following results:

Geometric mean - 1.01x slower
All benchmarks:
===============

+--------------------------+----------+------------------------+
| Benchmark                | ref      | fix                    |
+==========================+==========+========================+
| async_generators         | 421 ms   | 427 ms: 1.01x slower   |
+--------------------------+----------+------------------------+
| asyncio_tcp              | 769 ms   | 741 ms: 1.04x faster   |
+--------------------------+----------+------------------------+
| asyncio_tcp_ssl          | 1.92 sec | 1.88 sec: 1.02x faster |
+--------------------------+----------+------------------------+
| chaos                    | 79.7 ms  | 81.4 ms: 1.02x slower  |
+--------------------------+----------+------------------------+
| coverage                 | 86.3 ms  | 85.0 ms: 1.02x faster  |
+--------------------------+----------+------------------------+
| deepcopy                 | 317 us   | 323 us: 1.02x slower   |
+--------------------------+----------+------------------------+
| deepcopy_reduce          | 3.34 us  | 3.42 us: 1.02x slower  |
+--------------------------+----------+------------------------+
| deepcopy_memo            | 36.3 us  | 36.7 us: 1.01x slower  |
+--------------------------+----------+------------------------+
| deltablue                | 4.71 ms  | 4.81 ms: 1.02x slower  |
+--------------------------+----------+------------------------+
| django_template          | 44.7 ms  | 45.8 ms: 1.03x slower  |
+--------------------------+----------+------------------------+
| docutils                 | 2.46 sec | 2.48 sec: 1.01x slower |
+--------------------------+----------+------------------------+
| fannkuch                 | 429 ms   | 435 ms: 1.01x slower   |
+--------------------------+----------+------------------------+
| genshi_text              | 28.5 ms  | 29.9 ms: 1.05x slower  |
+--------------------------+----------+------------------------+
| genshi_xml               | 61.8 ms  | 63.7 ms: 1.03x slower  |
+--------------------------+----------+------------------------+
| logging_simple           | 12.1 us  | 12.2 us: 1.01x slower  |
+--------------------------+----------+------------------------+
| meteor_contest           | 101 ms   | 103 ms: 1.02x slower   |
+--------------------------+----------+------------------------+
| nqueens                  | 103 ms   | 106 ms: 1.03x slower   |
+--------------------------+----------+------------------------+
| pickle                   | 11.3 us  | 11.2 us: 1.01x faster  |
+--------------------------+----------+------------------------+
| pickle_dict              | 25.8 us  | 25.3 us: 1.02x faster  |
+--------------------------+----------+------------------------+
| pickle_pure_python       | 411 us   | 415 us: 1.01x slower   |
+--------------------------+----------+------------------------+
| pprint_safe_repr         | 1.03 sec | 1.03 sec: 1.00x slower |
+--------------------------+----------+------------------------+
| pprint_pformat           | 2.09 sec | 2.11 sec: 1.01x slower |
+--------------------------+----------+------------------------+
| pyflate                  | 486 ms   | 489 ms: 1.01x slower   |
+--------------------------+----------+------------------------+
| regex_compile            | 144 ms   | 145 ms: 1.01x slower   |
+--------------------------+----------+------------------------+
| regex_effbot             | 2.00 ms  | 2.02 ms: 1.01x slower  |
+--------------------------+----------+------------------------+
| richards                 | 59.1 ms  | 60.1 ms: 1.02x slower  |
+--------------------------+----------+------------------------+
| richards_super           | 67.5 ms  | 69.1 ms: 1.02x slower  |
+--------------------------+----------+------------------------+
| scimark_fft              | 312 ms   | 316 ms: 1.01x slower   |
+--------------------------+----------+------------------------+
| scimark_sor              | 148 ms   | 149 ms: 1.01x slower   |
+--------------------------+----------+------------------------+
| spectral_norm            | 115 ms   | 116 ms: 1.01x slower   |
+--------------------------+----------+------------------------+
| sqlglot_normalize        | 129 ms   | 132 ms: 1.02x slower   |
+--------------------------+----------+------------------------+
| sqlglot_optimize         | 60.0 ms  | 61.0 ms: 1.02x slower  |
+--------------------------+----------+------------------------+
| sqlglot_parse            | 1.53 ms  | 1.54 ms: 1.01x slower  |
+--------------------------+----------+------------------------+
| sqlite_synth             | 2.33 us  | 2.38 us: 1.02x slower  |
+--------------------------+----------+------------------------+
| sympy_expand             | 495 ms   | 507 ms: 1.02x slower   |
+--------------------------+----------+------------------------+
| sympy_integrate          | 19.0 ms  | 19.7 ms: 1.04x slower  |
+--------------------------+----------+------------------------+
| sympy_sum                | 143 ms   | 145 ms: 1.01x slower   |
+--------------------------+----------+------------------------+
| sympy_str                | 285 ms   | 289 ms: 1.02x slower   |
+--------------------------+----------+------------------------+
| telco                    | 8.19 ms  | 8.11 ms: 1.01x faster  |
+--------------------------+----------+------------------------+
| typing_runtime_protocols | 184 us   | 186 us: 1.01x slower   |
+--------------------------+----------+------------------------+
| unpickle                 | 14.3 us  | 14.1 us: 1.01x faster  |
+--------------------------+----------+------------------------+
| unpickle_pure_python     | 302 us   | 306 us: 1.01x slower   |
+--------------------------+----------+------------------------+
| xml_etree_parse          | 137 ms   | 133 ms: 1.03x faster   |
+--------------------------+----------+------------------------+
| xml_etree_iterparse      | 108 ms   | 106 ms: 1.02x faster   |
+--------------------------+----------+------------------------+
| xml_etree_generate       | 107 ms   | 109 ms: 1.01x slower   |
+--------------------------+----------+------------------------+
| xml_etree_process        | 76.4 ms  | 77.7 ms: 1.02x slower  |
+--------------------------+----------+------------------------+
| Geometric mean           | (ref)    | 1.01x slower           |
+--------------------------+----------+------------------------+

Benchmark hidden because not significant (37): 2to3, comprehensions, bench_mp_pool, bench_thread_pool, coroutines, crypto_pyaes, dask, dulwich_log, float, create_gc_cycles, gc_traversal, generators, go, hexiom, html5lib, json_dumps, json_loads, logging_format, logging_silent, mako, mdp, nbody, pathlib, pickle_list, pidigits, python_startup, python_startup_no_site, raytrace, regex_dna, regex_v8, scimark_lu, scimark_monte_carlo, scimark_sparse_mat_mult, sqlglot_transpile, tomli_loads, unpack_sequence, unpickle_list

CPU

CPU

11th Gen Intel(R) Core(TM) i5-11600K @ 3.90GHz

Base speed:	3.91 GHz
Sockets:	1
Cores:	6
Logical processors:	12
Virtualization:	Enabled
L1 cache:	480 KB
L2 cache:	3.0 MB
L3 cache:	12.0 MB

Utilization	3%
Speed	2.38 GHz
Up time	11:08:47:59
Processes	470
Threads	7904
Handles	253279

sergey-miryanov avatar Jun 23 '25 07:06 sergey-miryanov

@Zheaoli @ZeroIntensity @sobolevn @Eclips4 @efimov-mikhail @fxeqxmulfx Please take a look, it is ready to review.

sergey-miryanov avatar Jun 23 '25 07:06 sergey-miryanov

Updated results:

Geometric mean - 1.00x faster
All benchmarks:
===============

+--------------------------+----------+------------------------+
| Benchmark                | ref      | fix2                   |
+==========================+==========+========================+
| 2to3                     | 341 ms   | 345 ms: 1.01x slower   |
+--------------------------+----------+------------------------+
| asyncio_tcp              | 769 ms   | 745 ms: 1.03x faster   |
+--------------------------+----------+------------------------+
| asyncio_tcp_ssl          | 1.92 sec | 1.89 sec: 1.01x faster |
+--------------------------+----------+------------------------+
| chaos                    | 79.7 ms  | 79.0 ms: 1.01x faster  |
+--------------------------+----------+------------------------+
| coroutines               | 30.5 ms  | 31.0 ms: 1.02x slower  |
+--------------------------+----------+------------------------+
| dulwich_log              | 85.9 ms  | 85.4 ms: 1.01x faster  |
+--------------------------+----------+------------------------+
| fannkuch                 | 429 ms   | 434 ms: 1.01x slower   |
+--------------------------+----------+------------------------+
| gc_traversal             | 2.91 ms  | 2.93 ms: 1.01x slower  |
+--------------------------+----------+------------------------+
| generators               | 42.8 ms  | 43.1 ms: 1.01x slower  |
+--------------------------+----------+------------------------+
| mdp                      | 1.36 sec | 1.37 sec: 1.00x slower |
+--------------------------+----------+------------------------+
| meteor_contest           | 101 ms   | 100 ms: 1.00x faster   |
+--------------------------+----------+------------------------+
| pickle                   | 11.3 us  | 11.2 us: 1.01x faster  |
+--------------------------+----------+------------------------+
| pickle_dict              | 25.8 us  | 25.4 us: 1.02x faster  |
+--------------------------+----------+------------------------+
| pprint_pformat           | 2.09 sec | 2.10 sec: 1.00x slower |
+--------------------------+----------+------------------------+
| scimark_fft              | 312 ms   | 314 ms: 1.01x slower   |
+--------------------------+----------+------------------------+
| scimark_lu               | 128 ms   | 129 ms: 1.01x slower   |
+--------------------------+----------+------------------------+
| scimark_sor              | 148 ms   | 149 ms: 1.01x slower   |
+--------------------------+----------+------------------------+
| spectral_norm            | 115 ms   | 117 ms: 1.01x slower   |
+--------------------------+----------+------------------------+
| sqlglot_parse            | 1.53 ms  | 1.54 ms: 1.01x slower  |
+--------------------------+----------+------------------------+
| sqlite_synth             | 2.33 us  | 2.35 us: 1.01x slower  |
+--------------------------+----------+------------------------+
| typing_runtime_protocols | 184 us   | 185 us: 1.01x slower   |
+--------------------------+----------+------------------------+
| unpickle                 | 14.3 us  | 14.2 us: 1.01x faster  |
+--------------------------+----------+------------------------+
| unpickle_list            | 4.09 us  | 4.13 us: 1.01x slower  |
+--------------------------+----------+------------------------+
| unpickle_pure_python     | 302 us   | 300 us: 1.01x faster   |
+--------------------------+----------+------------------------+
| xml_etree_parse          | 137 ms   | 133 ms: 1.03x faster   |
+--------------------------+----------+------------------------+
| xml_etree_iterparse      | 108 ms   | 107 ms: 1.01x faster   |
+--------------------------+----------+------------------------+
| xml_etree_process        | 76.4 ms  | 77.4 ms: 1.01x slower  |
+--------------------------+----------+------------------------+
| Geometric mean           | (ref)    | 1.00x faster           |
+--------------------------+----------+------------------------+

Benchmark hidden because not significant (56): async_generators, comprehensions, bench_mp_pool, bench_thread_pool, coverage, crypto_pyaes, dask, deepcopy, deepcopy_reduce, deepcopy_memo, deltablue, django_template, docutils, float, create_gc_cycles, genshi_text, genshi_xml, go, hexiom, html5lib, json_dumps, json_loads, logging_format, logging_silent, logging_simple, mako, nbody, nqueens, pathlib, pickle_list, pickle_pure_python, pidigits, pprint_safe_repr, pyflate, python_startup, python_startup_no_site, raytrace, regex_compile, regex_dna, regex_effbot, regex_v8, richards, richards_super, scimark_monte_carlo, scimark_sparse_mat_mult, sqlglot_normalize, sqlglot_optimize, sqlglot_transpile, sympy_expand, sympy_integrate, sympy_sum, sympy_str, telco, tomli_loads, unpack_sequence, xml_etree_generate

sergey-miryanov avatar Jun 23 '25 10:06 sergey-miryanov

I'm still investigating this bug. As of now, I think the key problem is that handle_weakrefs() gets called before finalize_garbage(). The tp_subclasses dict contains weakref objects. When handle_weakrefs() gets called, those all get cleared. Then, finalizers like __del__ methods that mutate things do not correct invalidate the type cache. I'm not sure of the fix yet. I'm not keen on the idea of making a separate pass to handle the types.

nascheme avatar Jun 28 '25 01:06 nascheme

Thanks!

I'm still investigating this bug. As of now, I think the key problem is that handle_weakrefs() gets called before finalize_garbage(). The tp_subclasses dict contains weakref objects. When handle_weakrefs() gets called, those all get cleared. Then, finalizers like __del__ methods that mutate things do not correct invalidate the type cache.

Yeah, we understand problem the same way. This is a reason why I split handling types and other objects.

I'm not sure of the fix yet. I'm not keen on the idea of making a separate pass to handle the types.

I believe you have much more experience with this than I do (one of the articles I read about Python GC was yours from early 00). So, if you have another way to fix this then it will great opportunity to learn something new :)

sergey-miryanov avatar Jun 28 '25 03:06 sergey-miryanov

Closing in favor of https://github.com/python/cpython/pull/136189

sergey-miryanov avatar Jul 03 '25 16:07 sergey-miryanov