qpid-dispatch
qpid-dispatch copied to clipboard
Make lsan suppressions more specific and fix revealed leaks
I did not finish with making this leak free yet. It is quite difficult for me to predict what happens as I make (pretty significant :( changes all around... Currently, my biggest problem is that after I allow Python to collect the circular object subgraph including IoAdapter, the collection is happening too late in the shutdown, where I cannot no longer schedule actions:
14: ==30770==ERROR: AddressSanitizer: heap-use-after-free on address 0x61a00003a8f0 at pc 0x7f321b4e7dae bp 0x7ffd94518bd0 sp 0x7ffd94518bc8
14: READ of size 8 at 0x61a00003a8f0 thread T0
14: #0 0x7f321b4e7dad in qdr_action_enqueue ../src/router_core/router_core.c:399
14: #1 0x7f321b5043db in qdr_core_unsubscribe ../src/router_core/route_tables.c:173
14: #2 0x7f321b453b0c in IoAdapter_dealloc ../src/python_embedded.c:737
14: #3 0x7f321ac861ad in list_dealloc (/nix/store/0yhk4sk4x9s9hsrf3p1skbfy1pwd1rbf-python3-3.8.5/lib/libpython3.8.so.1.0+0x1371ad)
14: #4 0x7f321ac9e59a in dict_dealloc (/nix/store/0yhk4sk4x9s9hsrf3p1skbfy1pwd1rbf-python3-3.8.5/lib/libpython3.8.so.1.0+0x14f59a)
14: #5 0x7f321acb8138 in subtype_clear (/nix/store/0yhk4sk4x9s9hsrf3p1skbfy1pwd1rbf-python3-3.8.5/lib/libpython3.8.so.1.0+0x169138)
14: #6 0x7f321ad115d0 in collect.constprop.0 (/nix/store/0yhk4sk4x9s9hsrf3p1skbfy1pwd1rbf-python3-3.8.5/lib/libpython3.8.so.1.0+0x1c25d0)
14: #7 0x7f321ad72670 in collect_with_callback.constprop.0 (/nix/store/0yhk4sk4x9s9hsrf3p1skbfy1pwd1rbf-python3-3.8.5/lib/libpython3.8.so.1.0+0x223670)
14: #8 0x7f321ad7275d in PyGC_Collect (/nix/store/0yhk4sk4x9s9hsrf3p1skbfy1pwd1rbf-python3-3.8.5/lib/libpython3.8.so.1.0+0x22375d)
14: #9 0x7f321ad68d0b in Py_FinalizeEx (/nix/store/0yhk4sk4x9s9hsrf3p1skbfy1pwd1rbf-python3-3.8.5/lib/libpython3.8.so.1.0+0x219d0b)
14: #10 0x7f321b454cc6 in qd_python_finalize ../src/python_embedded.c:73
14: #11 0x7f321b3fb0f1 in qd_dispatch_free ../src/dispatch.c:388
14: #12 0x402625 in main_process ../router/src/main.c:117
14: #13 0x403f4b in main ../router/src/main.c:367
14: #14 0x7f321a047cbc in __libc_start_main (/nix/store/q53f5birhik4dxg3q3r2g5f324n7r5mc-glibc-2.31-74/lib/libc.so.6+0x23cbc)
14: #15 0x402419 in _start (/home/jdanek/repos/qpid/qpid-dispatch/cmake-build-debug/router/qdrouterd+0x402419)
I could not figure out what is keeping the object subgraph alive so that it does not get collected sooner, when I want it to be destroyed.
Also, the suppressions need to be tried on all supported platforms, because each Python version, etc. can leak differently. And as the Jira shows, suppressing all leaky traces that include Python is not a good solution, because it suppresses too much (incl. the IoAdapter leak; admittedly not super-serious, but this is a matter of principle! `)
I'll revisit this next weekend and try to make some more sense of ^^^
edit: it would probably make the most sense to do this in two stages, first rewrite suppression file (so that CI stays green), then address suppressions one by one.
Codecov Report
Merging #1032 (3f360b4) into master (ce7c5a0) will increase coverage by
0.06%
. The diff coverage is100.00%
.
@@ Coverage Diff @@
## master #1032 +/- ##
==========================================
+ Coverage 82.41% 82.47% +0.06%
==========================================
Files 111 111
Lines 27315 27331 +16
==========================================
+ Hits 22512 22542 +30
+ Misses 4803 4789 -14
Impacted Files | Coverage Δ | |
---|---|---|
src/dispatch.c | 83.80% <100.00%> (+0.07%) |
:arrow_up: |
src/python_embedded.c | 68.28% <100.00%> (-0.15%) |
:arrow_down: |
src/router_core/modules/mobile_sync/mobile.c | 92.21% <100.00%> (+0.03%) |
:arrow_up: |
src/router_core/router_core.c | 86.20% <100.00%> (ø) |
|
src/router_node.c | 94.12% <100.00%> (-0.11%) |
:arrow_down: |
src/router_pynode.c | 88.39% <100.00%> (+0.42%) |
:arrow_up: |
src/server.c | 86.67% <100.00%> (ø) |
|
tests/core_timer_test.c | 90.47% <100.00%> (+0.09%) |
:arrow_up: |
src/router_core/forwarder.c | 92.64% <0.00%> (-0.40%) |
:arrow_down: |
... and 6 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update ce7c5a0...3f360b4. Read the comment docs.
Superseded by #1048 and other PRs still waiting to be proposed. I'll leave this open for a while longer, until the other PRs get processed.
Codecov Report
Merging #1032 (979481a) into main (b919a63) will increase coverage by
0.00%
. The diff coverage is100.00%
.
@@ Coverage Diff @@
## main #1032 +/- ##
=======================================
Coverage 84.22% 84.23%
=======================================
Files 111 111
Lines 27569 27580 +11
=======================================
+ Hits 23220 23232 +12
+ Misses 4349 4348 -1
Impacted Files | Coverage Δ | |
---|---|---|
src/dispatch.c | 83.80% <100.00%> (+0.07%) |
:arrow_up: |
src/python_embedded.c | 68.28% <100.00%> (-0.15%) |
:arrow_down: |
src/router_core/modules/mobile_sync/mobile.c | 92.21% <100.00%> (+0.03%) |
:arrow_up: |
src/router_core/router_core.c | 85.19% <100.00%> (-0.67%) |
:arrow_down: |
src/router_node.c | 92.94% <100.00%> (+0.19%) |
:arrow_up: |
src/router_pynode.c | 88.23% <100.00%> (+0.27%) |
:arrow_up: |
...c/router_core/modules/test_hooks/core_test_hooks.c | 92.03% <0.00%> (-1.28%) |
:arrow_down: |
src/router_core/connections.c | 89.31% <0.00%> (-0.30%) |
:arrow_down: |
... and 8 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update b919a63...979481a. Read the comment docs.