qpid-dispatch icon indicating copy to clipboard operation
qpid-dispatch copied to clipboard

Make lsan suppressions more specific and fix revealed leaks

Open jiridanek opened this issue 4 years ago • 4 comments

I did not finish with making this leak free yet. It is quite difficult for me to predict what happens as I make (pretty significant :( changes all around... Currently, my biggest problem is that after I allow Python to collect the circular object subgraph including IoAdapter, the collection is happening too late in the shutdown, where I cannot no longer schedule actions:

14: ==30770==ERROR: AddressSanitizer: heap-use-after-free on address 0x61a00003a8f0 at pc 0x7f321b4e7dae bp 0x7ffd94518bd0 sp 0x7ffd94518bc8
14: READ of size 8 at 0x61a00003a8f0 thread T0
14:     #0 0x7f321b4e7dad in qdr_action_enqueue ../src/router_core/router_core.c:399
14:     #1 0x7f321b5043db in qdr_core_unsubscribe ../src/router_core/route_tables.c:173
14:     #2 0x7f321b453b0c in IoAdapter_dealloc ../src/python_embedded.c:737
14:     #3 0x7f321ac861ad in list_dealloc (/nix/store/0yhk4sk4x9s9hsrf3p1skbfy1pwd1rbf-python3-3.8.5/lib/libpython3.8.so.1.0+0x1371ad)
14:     #4 0x7f321ac9e59a in dict_dealloc (/nix/store/0yhk4sk4x9s9hsrf3p1skbfy1pwd1rbf-python3-3.8.5/lib/libpython3.8.so.1.0+0x14f59a)
14:     #5 0x7f321acb8138 in subtype_clear (/nix/store/0yhk4sk4x9s9hsrf3p1skbfy1pwd1rbf-python3-3.8.5/lib/libpython3.8.so.1.0+0x169138)
14:     #6 0x7f321ad115d0 in collect.constprop.0 (/nix/store/0yhk4sk4x9s9hsrf3p1skbfy1pwd1rbf-python3-3.8.5/lib/libpython3.8.so.1.0+0x1c25d0)
14:     #7 0x7f321ad72670 in collect_with_callback.constprop.0 (/nix/store/0yhk4sk4x9s9hsrf3p1skbfy1pwd1rbf-python3-3.8.5/lib/libpython3.8.so.1.0+0x223670)
14:     #8 0x7f321ad7275d in PyGC_Collect (/nix/store/0yhk4sk4x9s9hsrf3p1skbfy1pwd1rbf-python3-3.8.5/lib/libpython3.8.so.1.0+0x22375d)
14:     #9 0x7f321ad68d0b in Py_FinalizeEx (/nix/store/0yhk4sk4x9s9hsrf3p1skbfy1pwd1rbf-python3-3.8.5/lib/libpython3.8.so.1.0+0x219d0b)
14:     #10 0x7f321b454cc6 in qd_python_finalize ../src/python_embedded.c:73
14:     #11 0x7f321b3fb0f1 in qd_dispatch_free ../src/dispatch.c:388
14:     #12 0x402625 in main_process ../router/src/main.c:117
14:     #13 0x403f4b in main ../router/src/main.c:367
14:     #14 0x7f321a047cbc in __libc_start_main (/nix/store/q53f5birhik4dxg3q3r2g5f324n7r5mc-glibc-2.31-74/lib/libc.so.6+0x23cbc)
14:     #15 0x402419 in _start (/home/jdanek/repos/qpid/qpid-dispatch/cmake-build-debug/router/qdrouterd+0x402419)

I could not figure out what is keeping the object subgraph alive so that it does not get collected sooner, when I want it to be destroyed.

Also, the suppressions need to be tried on all supported platforms, because each Python version, etc. can leak differently. And as the Jira shows, suppressing all leaky traces that include Python is not a good solution, because it suppresses too much (incl. the IoAdapter leak; admittedly not super-serious, but this is a matter of principle! `)

jiridanek avatar Feb 15 '21 09:02 jiridanek

I'll revisit this next weekend and try to make some more sense of ^^^

edit: it would probably make the most sense to do this in two stages, first rewrite suppression file (so that CI stays green), then address suppressions one by one.

jiridanek avatar Feb 15 '21 09:02 jiridanek

Codecov Report

Merging #1032 (3f360b4) into master (ce7c5a0) will increase coverage by 0.06%. The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1032      +/-   ##
==========================================
+ Coverage   82.41%   82.47%   +0.06%     
==========================================
  Files         111      111              
  Lines       27315    27331      +16     
==========================================
+ Hits        22512    22542      +30     
+ Misses       4803     4789      -14     
Impacted Files Coverage Δ
src/dispatch.c 83.80% <100.00%> (+0.07%) :arrow_up:
src/python_embedded.c 68.28% <100.00%> (-0.15%) :arrow_down:
src/router_core/modules/mobile_sync/mobile.c 92.21% <100.00%> (+0.03%) :arrow_up:
src/router_core/router_core.c 86.20% <100.00%> (ø)
src/router_node.c 94.12% <100.00%> (-0.11%) :arrow_down:
src/router_pynode.c 88.39% <100.00%> (+0.42%) :arrow_up:
src/server.c 86.67% <100.00%> (ø)
tests/core_timer_test.c 90.47% <100.00%> (+0.09%) :arrow_up:
src/router_core/forwarder.c 92.64% <0.00%> (-0.40%) :arrow_down:
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update ce7c5a0...3f360b4. Read the comment docs.

codecov-io avatar Feb 15 '21 11:02 codecov-io

Superseded by #1048 and other PRs still waiting to be proposed. I'll leave this open for a while longer, until the other PRs get processed.

jiridanek avatar Feb 20 '21 16:02 jiridanek

Codecov Report

Merging #1032 (979481a) into main (b919a63) will increase coverage by 0.00%. The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #1032   +/-   ##
=======================================
  Coverage   84.22%   84.23%           
=======================================
  Files         111      111           
  Lines       27569    27580   +11     
=======================================
+ Hits        23220    23232   +12     
+ Misses       4349     4348    -1     
Impacted Files Coverage Δ
src/dispatch.c 83.80% <100.00%> (+0.07%) :arrow_up:
src/python_embedded.c 68.28% <100.00%> (-0.15%) :arrow_down:
src/router_core/modules/mobile_sync/mobile.c 92.21% <100.00%> (+0.03%) :arrow_up:
src/router_core/router_core.c 85.19% <100.00%> (-0.67%) :arrow_down:
src/router_node.c 92.94% <100.00%> (+0.19%) :arrow_up:
src/router_pynode.c 88.23% <100.00%> (+0.27%) :arrow_up:
...c/router_core/modules/test_hooks/core_test_hooks.c 92.03% <0.00%> (-1.28%) :arrow_down:
src/router_core/connections.c 89.31% <0.00%> (-0.30%) :arrow_down:
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update b919a63...979481a. Read the comment docs.

codecov-commenter avatar Apr 20 '21 11:04 codecov-commenter