qpid-dispatch icon indicating copy to clipboard operation
qpid-dispatch copied to clipboard

DISPATCH-848, DISPATCH-1962 Fix leak of IoAdapter_init

Open jiridanek opened this issue 4 years ago • 16 comments

This PR depends on PRs to fix leaks in qdr_subscription_t (https://github.com/apache/qpid-dispatch/pull/1051) and qdr_core_t (https://github.com/apache/qpid-dispatch/pull/1049). The former is still WIP, so I won't carry the commit here, because it would make review harder.

The original problem is

9: Direct leak of 56 byte(s) in 1 object(s) allocated from:
9:     #0 0x7f78a3606e8f in __interceptor_malloc (/nix/store/g40sl3zh3nv52vj0mrl4iki5iphh5ika-gcc-10.2.0-lib/lib/libasan.so.6+0xace8f)
9:     #1 0x7f78a2d64afb in qd_malloc ../include/qpid/dispatch/ctools.h:229
9:     #2 0x7f78a2d657da in qdr_core_subscribe ../src/router_core/route_tables.c:149
9:     #3 0x7f78a2c83072 in IoAdapter_init ../src/python_embedded.c:711
9:     #4 0x7f78a2353a6c in type_call (/nix/store/r85nxfnwiv45nbmf5yb60jj8ajim4m7w-python3-3.8.5/lib/libpython3.8.so.1.0+0x165a6c)

The solution consists of

  1. implementing support for GC in IoAdapter so that Python can free the cycle described in the Jira.
  2. Fixing Decreffing so that Python does free
  3. Calling PyGC_Collect() at an opportune place where finalizers can be called without accessing previously already freed memory.

jiridanek avatar Feb 20 '21 22:02 jiridanek

Codecov Report

Merging #1052 (78fb8d1) into master (844f995) will increase coverage by 0.14%. The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1052      +/-   ##
==========================================
+ Coverage   82.48%   82.63%   +0.14%     
==========================================
  Files         111      111              
  Lines       27425    27439      +14     
==========================================
+ Hits        22621    22673      +52     
+ Misses       4804     4766      -38     
Impacted Files Coverage Δ
src/dispatch.c 83.80% <100.00%> (+0.07%) :arrow_up:
src/python_embedded.c 69.55% <100.00%> (+1.12%) :arrow_up:
src/router_node.c 94.03% <100.00%> (ø)
src/router_pynode.c 88.34% <100.00%> (+0.37%) :arrow_up:
tests/core_timer_test.c 90.47% <100.00%> (+0.09%) :arrow_up:
src/router_core/transfer.c 91.81% <0.00%> (-0.44%) :arrow_down:
src/router_core/delivery.c 93.12% <0.00%> (-0.39%) :arrow_down:
src/router_core/connections.c 88.61% <0.00%> (-0.20%) :arrow_down:
src/message.c 87.07% <0.00%> (-0.08%) :arrow_down:
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 844f995...78fb8d1. Read the comment docs.

codecov-io avatar Feb 20 '21 23:02 codecov-io

Here's a GHA run with PR #1051 merged in. There are some leaks from Proton then, but the IoAdapter leak is fixed. https://github.com/jiridanek/qpid-dispatch/actions/runs/585158010

jiridanek avatar Feb 21 '21 10:02 jiridanek

not sure what to think about this

https://travis-ci.com/github/apache/qpid-dispatch/jobs/498893659#L4652

24: =================================================================
24: ==13469==ERROR: AddressSanitizer: attempting free on address which was not malloc()-ed: 0xffff7f0441d8 in thread T0
24:     #0 0x48f30c in free (/home/travis/build/apache/qpid-dispatch/build/router/qdrouterd+0x48f30c)
24:     #1 0xffff7ed81524  (/lib/aarch64-linux-gnu/libpython3.8.so.1.0+0x23a524)
24:     #2 0xffff7eddbd70  (/lib/aarch64-linux-gnu/libpython3.8.so.1.0+0x294d70)
24:     #3 0xffff7edc0808  (/lib/aarch64-linux-gnu/libpython3.8.so.1.0+0x279808)
24:     #4 0xffff7edc2a58  (/lib/aarch64-linux-gnu/libpython3.8.so.1.0+0x27ba58)
24:     #5 0xffff7eda918c  (/lib/aarch64-linux-gnu/libpython3.8.so.1.0+0x26218c)
24:     #6 0xffff7eda4a28  (/lib/aarch64-linux-gnu/libpython3.8.so.1.0+0x25da28)
24:     #7 0xffff7ebc79d4  (/lib/aarch64-linux-gnu/libpython3.8.so.1.0+0x809d4)
24:     #8 0xffff7eca6d88 in _PyGC_CollectNoFail (/lib/aarch64-linux-gnu/libpython3.8.so.1.0+0x15fd88)
24:     #9 0xffff7ece42e8 in PyImport_Cleanup (/lib/aarch64-linux-gnu/libpython3.8.so.1.0+0x19d2e8)
24:     #10 0xffff7ecd17e8 in Py_FinalizeEx (/lib/aarch64-linux-gnu/libpython3.8.so.1.0+0x18a7e8)
24:     #11 0xffff7f509654 in qd_dispatch_free /home/travis/build/apache/qpid-dispatch/src/dispatch.c:377:5
24:     #12 0x4bd0a4 in main_process /home/travis/build/apache/qpid-dispatch/router/src/main.c:119:5
24:     #13 0x4bbe24 in main /home/travis/build/apache/qpid-dispatch/router/src/main.c:369:9
24:     #14 0xffff7e8f308c in __libc_start_main (/lib/aarch64-linux-gnu/libc.so.6+0x2408c)
24:     #15 0x4214a8 in _start (/home/travis/build/apache/qpid-dispatch/build/router/qdrouterd+0x4214a8)
24: 
24: Address 0xffff7f0441d8 is a wild pointer.
24: SUMMARY: AddressSanitizer: bad-free (/home/travis/build/apache/qpid-dispatch/build/router/qdrouterd+0x48f30c) in free
24: ==13469==ABORTING

jiridanek avatar Apr 15 '21 22:04 jiridanek

That AArch64 crash happened under TSan too. Only issue somewhat related I can find is https://bugs.python.org/issue42697, which suggests that this could possibly be Python issue, as well as PR issue

24: FATAL: ThreadSanitizer CHECK failed: ../../../../gcc-10.2.0/libsanitizer/sanitizer_common/sanitizer_allocator_secondary.h:301 "((IsAligned(p, page_size_))) != (0)" (0x0, 0x0)
24:     #0 <null> <null> (libtsan.so.0+0x9b2e2)
24:     #1 <null> <null> (libtsan.so.0+0xb914a)
24:     #2 <null> <null> (libtsan.so.0+0x85a7c)
24:     #3 <null> <null> (libtsan.so.0+0x84dd5)
24:     #4 free <null> (libtsan.so.0+0x38be4)
24:     #5 tupledealloc <null> (libpython3.8.so.1.0+0x160c07)
24:     #6 code_dealloc <null> (libpython3.8.so.1.0+0x1300c3)
24:     #7 func_dealloc <null> (libpython3.8.so.1.0+0x136a2c)
24:     #8 free_keys_object <null> (libpython3.8.so.1.0+0x14f664)
24:     #9 type_clear <null> (libpython3.8.so.1.0+0x1f17d5)
24:     #10 collect.constprop.0 <null> (libpython3.8.so.1.0+0x1c26b1)
24:     #11 _PyGC_CollectNoFail <null> (libpython3.8.so.1.0+0x2237d1)
24:     #12 PyImport_Cleanup <null> (libpython3.8.so.1.0+0x212792)
24:     #13 Py_FinalizeEx <null> (libpython3.8.so.1.0+0x219d10)
24:     #14 qd_python_finalize ../src/python_embedded.c:71 (libqpid-dispatch.so+0x90827)
24:     #15 qd_dispatch_free ../src/dispatch.c:377 (libqpid-dispatch.so+0x713b8)
24:     #16 main_process ../router/src/main.c:119 (qdrouterd+0x40263a)
24:     #17 main ../router/src/main.c:369 (qdrouterd+0x4032ae)
24:     #18 __libc_start_main <null> (libc.so.6+0x23cbc)
24:     #19 _start <null> (qdrouterd+0x4023a9)

24: FAILED (errors=1)
24/74 Test #24: system_tests_policy ...............................***Failed   48.86 sec

jiridanek avatar Apr 21 '21 06:04 jiridanek

Codecov Report

Merging #1052 (fe13407) into main (8147b62) will increase coverage by 0.01%. The diff coverage is 94.28%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1052      +/-   ##
==========================================
+ Coverage   84.86%   84.88%   +0.01%     
==========================================
  Files         116      116              
  Lines       28644    28635       -9     
==========================================
- Hits        24310    24307       -3     
+ Misses       4334     4328       -6     
Impacted Files Coverage Δ
src/python_embedded.c 69.70% <90.47%> (+1.27%) :arrow_up:
src/dispatch.c 87.03% <100.00%> (+0.06%) :arrow_up:
src/router_core/router_core.c 85.62% <100.00%> (-1.43%) :arrow_down:
src/router_node.c 93.96% <100.00%> (+0.30%) :arrow_up:
src/router_pynode.c 88.34% <100.00%> (+0.49%) :arrow_up:
src/server.c 87.40% <100.00%> (+0.02%) :arrow_up:
src/router_core/delivery.c 93.71% <0.00%> (-0.74%) :arrow_down:
src/router_core/transfer.c 94.29% <0.00%> (-0.43%) :arrow_down:
src/adaptors/tcp_adaptor.c 77.10% <0.00%> (-0.32%) :arrow_down:
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 8147b62...fe13407. Read the comment docs.

codecov-commenter avatar Apr 21 '21 16:04 codecov-commenter

The Python crash happens on Python 3.8.2 (on Travis in Focal), on Python 3.8.5 (what I had), but not on Python 3.8.9 (version I upgraded my machine to). I am hoping that upgrading Travis will help. Ubuntu Focal does not have 3rd party releases for Python 3.8, so I am going for Python 3.9. There seems to be small incompatibility in tests due to how Python's exception hierarchy changed (??? !!! ???).

jiridanek avatar Apr 21 '21 17:04 jiridanek

test 68
      Start 68: system_tests_http2
68: Test command: /usr/bin/python3.9 "/home/travis/build/apache/qpid-dispatch/build/tests/run.py" "-m" "unittest" "-v" "system_tests_http2"
68: Test timeout computed to be: 600
68: ERROR
68: /usr/lib/python3.9/subprocess.py:1052: ResourceWarning: subprocess 19770 is still running
68:   _warn("subprocess %s is still running" % self.pid,
68: ResourceWarning: Enable tracemalloc to get the object allocation traceback
68: ERROR
68: /usr/lib/python3.9/subprocess.py:1052: ResourceWarning: subprocess 19774 is still running
68:   _warn("subprocess %s is still running" % self.pid,
68: ResourceWarning: Enable tracemalloc to get the object allocation traceback
68: ERROR
68: /usr/lib/python3.9/subprocess.py:1052: ResourceWarning: subprocess 19779 is still running
68:   _warn("subprocess %s is still running" % self.pid,
68: ResourceWarning: Enable tracemalloc to get the object allocation traceback
68: ERROR
68: /usr/lib/python3.9/subprocess.py:1052: ResourceWarning: subprocess 19783 is still running
68:   _warn("subprocess %s is still running" % self.pid,
68: ResourceWarning: Enable tracemalloc to get the object allocation traceback
68: ERROR
68: /usr/lib/python3.9/subprocess.py:1052: ResourceWarning: subprocess 19787 is still running
68:   _warn("subprocess %s is still running" % self.pid,
68: ResourceWarning: Enable tracemalloc to get the object allocation traceback
68: ERROR
68: /usr/lib/python3.9/subprocess.py:1052: ResourceWarning: subprocess 19791 is still running
68:   _warn("subprocess %s is still running" % self.pid,
68: ResourceWarning: Enable tracemalloc to get the object allocation traceback
68: ERROR
68: /usr/lib/python3.9/subprocess.py:1052: ResourceWarning: subprocess 19796 is still running
68:   _warn("subprocess %s is still running" % self.pid,
68: ResourceWarning: Enable tracemalloc to get the object allocation traceback
68: 
68: ======================================================================
68: ERROR: setUpClass (system_tests_http2.Http2TestEdgeInteriorRouter)
68: ----------------------------------------------------------------------
68: Traceback (most recent call last):
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 225, in wait_port
68:     retry_exception(connect, exception_test=check, **retry_kwargs)
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 167, in retry_exception
68:     return function()
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 220, in connect
68:     s.connect((host, port))
68: ConnectionRefusedError: [Errno 111] Connection refused
68: 
68: During handling of the above exception, another exception occurred:
68: 
68: Traceback (most recent call last):
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_tests_http2.py", line 558, in setUpClass
68:     cls.http2_server = cls.tester.http2server(name=cls.http2_server_name,
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 828, in http2server
68:     return self.cleanup(Http2Server(*args, **kwargs))
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 387, in __init__
68:     self.wait_ready()
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 395, in wait_ready
68:     self.wait_ports(**retry_kwargs)
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 398, in wait_ports
68:     wait_ports(self.ports_family, **retry_kwargs)
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 234, in wait_ports
68:     wait_port(port=port, protocol_family=protocol_family, **retry_kwargs)
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 227, in wait_port
68:     raise Exception("wait_port timeout on host %s port %s: %s" % (host, port, e))
68: Exception: wait_port timeout on host None port 20816: [Errno 111] Connection refused
68: 
68: ======================================================================
68: ERROR: setUpClass (system_tests_http2.Http2TestEdgeToEdgeViaInteriorRouter)
68: ----------------------------------------------------------------------
68: Traceback (most recent call last):
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 225, in wait_port
68:     retry_exception(connect, exception_test=check, **retry_kwargs)
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 167, in retry_exception
68:     return function()
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 220, in connect
68:     s.connect((host, port))
68: ConnectionRefusedError: [Errno 111] Connection refused
68: 
68: During handling of the above exception, another exception occurred:
68: 
68: Traceback (most recent call last):
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_tests_http2.py", line 644, in setUpClass
68:     cls.http2_server = cls.tester.http2server(name=cls.http2_server_name,
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 828, in http2server
68:     return self.cleanup(Http2Server(*args, **kwargs))
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 387, in __init__
68:     self.wait_ready()
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 395, in wait_ready
68:     self.wait_ports(**retry_kwargs)
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 398, in wait_ports
68:     wait_ports(self.ports_family, **retry_kwargs)
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 234, in wait_ports
68:     wait_port(port=port, protocol_family=protocol_family, **retry_kwargs)
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 227, in wait_port
68:     raise Exception("wait_port timeout on host %s port %s: %s" % (host, port, e))
68: Exception: wait_port timeout on host None port 20817: [Errno 111] Connection refused
68: 
68: ======================================================================
68: ERROR: setUpClass (system_tests_http2.Http2TestInteriorEdgeRouter)
68: ----------------------------------------------------------------------
68: Traceback (most recent call last):
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 225, in wait_port
68:     retry_exception(connect, exception_test=check, **retry_kwargs)
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 167, in retry_exception
68:     return function()
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 220, in connect
68:     s.connect((host, port))
68: ConnectionRefusedError: [Errno 111] Connection refused
68: 
68: During handling of the above exception, another exception occurred:
68: 
68: Traceback (most recent call last):
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_tests_http2.py", line 600, in setUpClass
68:     cls.http2_server = cls.tester.http2server(name=cls.http2_server_name,
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 828, in http2server
68:     return self.cleanup(Http2Server(*args, **kwargs))
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 387, in __init__
68:     self.wait_ready()
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 395, in wait_ready
68:     self.wait_ports(**retry_kwargs)
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 398, in wait_ports
68:     wait_ports(self.ports_family, **retry_kwargs)
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 234, in wait_ports
68:     wait_port(port=port, protocol_family=protocol_family, **retry_kwargs)
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 227, in wait_port
68:     raise Exception("wait_port timeout on host %s port %s: %s" % (host, port, e))
68: Exception: wait_port timeout on host None port 20818: [Errno 111] Connection refused
68: 
68: ======================================================================
68: ERROR: setUpClass (system_tests_http2.Http2TestOneEdgeRouter)
68: ----------------------------------------------------------------------
68: Traceback (most recent call last):
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 225, in wait_port
68:     retry_exception(connect, exception_test=check, **retry_kwargs)
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 167, in retry_exception
68:     return function()
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 220, in connect
68:     s.connect((host, port))
68: ConnectionRefusedError: [Errno 111] Connection refused
68: 
68: During handling of the above exception, another exception occurred:
68: 
68: Traceback (most recent call last):
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_tests_http2.py", line 373, in setUpClass
68:     cls.http2_server = cls.tester.http2server(name=cls.http2_server_name,
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 828, in http2server
68:     return self.cleanup(Http2Server(*args, **kwargs))
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 387, in __init__
68:     self.wait_ready()
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 395, in wait_ready
68:     self.wait_ports(**retry_kwargs)
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 398, in wait_ports
68:     wait_ports(self.ports_family, **retry_kwargs)
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 234, in wait_ports
68:     wait_port(port=port, protocol_family=protocol_family, **retry_kwargs)
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 227, in wait_port
68:     raise Exception("wait_port timeout on host %s port %s: %s" % (host, port, e))
68: Exception: wait_port timeout on host None port 20819: [Errno 111] Connection refused
68: 
68: ======================================================================
68: ERROR: setUpClass (system_tests_http2.Http2TestOneInteriorRouter)
68: ----------------------------------------------------------------------
68: Traceback (most recent call last):
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 225, in wait_port
68:     retry_exception(connect, exception_test=check, **retry_kwargs)
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 167, in retry_exception
68:     return function()
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 220, in connect
68:     s.connect((host, port))
68: ConnectionRefusedError: [Errno 111] Connection refused
68: 
68: During handling of the above exception, another exception occurred:
68: 
68: Traceback (most recent call last):
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_tests_http2.py", line 411, in setUpClass
68:     cls.http2_server = cls.tester.http2server(name=cls.http2_server_name,
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 828, in http2server
68:     return self.cleanup(Http2Server(*args, **kwargs))
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 387, in __init__
68:     self.wait_ready()
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 395, in wait_ready
68:     self.wait_ports(**retry_kwargs)
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 398, in wait_ports
68:     wait_ports(self.ports_family, **retry_kwargs)
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 234, in wait_ports
68:     wait_port(port=port, protocol_family=protocol_family, **retry_kwargs)
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 227, in wait_port
68:     raise Exception("wait_port timeout on host %s port %s: %s" % (host, port, e))
68: Exception: wait_port timeout on host None port 20820: [Errno 111] Connection refused
68: 
68: ======================================================================
68: ERROR: setUpClass (system_tests_http2.Http2TestOneStandaloneRouter)
68: ----------------------------------------------------------------------
68: Traceback (most recent call last):
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 225, in wait_port
68:     retry_exception(connect, exception_test=check, **retry_kwargs)
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 167, in retry_exception
68:     return function()
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 220, in connect
68:     s.connect((host, port))
68: ConnectionRefusedError: [Errno 111] Connection refused
68: 
68: During handling of the above exception, another exception occurred:
68: 
68: Traceback (most recent call last):
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_tests_http2.py", line 290, in setUpClass
68:     cls.http2_server = cls.tester.http2server(name=cls.http2_server_name,
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 828, in http2server
68:     return self.cleanup(Http2Server(*args, **kwargs))
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 387, in __init__
68:     self.wait_ready()
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 395, in wait_ready
68:     self.wait_ports(**retry_kwargs)
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 398, in wait_ports
68:     wait_ports(self.ports_family, **retry_kwargs)
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 234, in wait_ports
68:     wait_port(port=port, protocol_family=protocol_family, **retry_kwargs)
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 227, in wait_port
68:     raise Exception("wait_port timeout on host %s port %s: %s" % (host, port, e))
68: Exception: wait_port timeout on host None port 20821: [Errno 111] Connection refused
68: 
68: ======================================================================
68: ERROR: setUpClass (system_tests_http2.Http2TestTwoRouter)
68: ----------------------------------------------------------------------
68: Traceback (most recent call last):
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 225, in wait_port
68:     retry_exception(connect, exception_test=check, **retry_kwargs)
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 167, in retry_exception
68:     return function()
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 220, in connect
68:     s.connect((host, port))
68: ConnectionRefusedError: [Errno 111] Connection refused
68: 
68: During handling of the above exception, another exception occurred:
68: 
68: Traceback (most recent call last):
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_tests_http2.py", line 449, in setUpClass
68:     cls.http2_server = cls.tester.http2server(name=cls.http2_server_name,
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 828, in http2server
68:     return self.cleanup(Http2Server(*args, **kwargs))
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 387, in __init__
68:     self.wait_ready()
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 395, in wait_ready
68:     self.wait_ports(**retry_kwargs)
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 398, in wait_ports
68:     wait_ports(self.ports_family, **retry_kwargs)
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 234, in wait_ports
68:     wait_port(port=port, protocol_family=protocol_family, **retry_kwargs)
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 227, in wait_port
68:     raise Exception("wait_port timeout on host %s port %s: %s" % (host, port, e))
68: Exception: wait_port timeout on host None port 20822: [Errno 111] Connection refused
68: 
68: ----------------------------------------------------------------------
68: Ran 0 tests in 422.274s
68: 
68: FAILED (errors=7)
68/74 Test #68: system_tests_http2 ................................***Failed  428.01 sec

jiridanek avatar Apr 21 '21 20:04 jiridanek

https://travis-ci.com/github/apache/qpid-dispatch/jobs/500355035#L4493

45: ======================================================================
45: ERROR: tearDownClass (system_tests_topology.TopologyTests)
45: ----------------------------------------------------------------------
45: Traceback (most recent call last):
45:   File "/Users/travis/build/apache/qpid-dispatch/tests/system_test.py", line 876, in tearDownClass
45:     cls.tester.teardown()
45:   File "/Users/travis/build/apache/qpid-dispatch/tests/system_test.py", line 819, in teardown
45:     raise RuntimeError("Errors during teardown: \n\n%s" % "\n\n".join([str(e) for e in errors]))
45: RuntimeError: Errors during teardown: 
45: 
45: Process 6605 error: exit code -6, expected -1
45: qdrouterd -c D.conf -I /Users/travis/build/apache/qpid-dispatch/python
45: /Users/travis/build/apache/qpid-dispatch/build/tests/system_test.dir/system_tests_topology/TopologyTests/setUpClass/D-11.cmd
45: >>>>
45: AddressSanitizer:DEADLYSIGNAL
45: =================================================================
45: ==6605==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x00010c0d4dfc bp 0x70000925fcf0 sp 0x70000925fc80 T2)
45: ==6605==The signal is caused by a READ memory access.
45: ==6605==Hint: address points to the zero page.
45:     #0 0x10c0d4dfb in qd_connection_invoke_deferred server.c:1506
45:     #1 0x10be818f0 in qd_connection_manager_delete_connector connection_manager.c:1060
45:     #2 0x10e44fef4 in ffi_call_unix64 (libffi.7.dylib:x86_64+0x4ef4)
45:     #3 0x70000925fe4f  (<unknown module>)
45: 
45: ==6605==Register values:
45: rax = 0x00001c3c000097d1  rbx = 0x000061e00004be88  rcx = 0x0000100000000000  rdx = 0x0000000000000000  
45: rdi = 0x000061e00004bd10  rsi = 0xffffffff00000000  rbp = 0x000070000925fcf0  rsp = 0x000070000925fc80  
45:  r8 = 0x00000000000020a0   r9 = 0x000060a000000498  r10 = 0x000000000000594c  r11 = 0x0000000000000206  
45: r12 = 0x00001c3c000097a5  r13 = 0x00001c22000151ea  r14 = 0x000061e00004bd10  r15 = 0x000061e00004bd28  
45: AddressSanitizer can not provide additional info.
45: SUMMARY: AddressSanitizer: SEGV server.c:1506 in qd_connection_invoke_deferred
45: Thread T2 created by T0 here:
45:     #0 0x10c915add in wrap_pthread_create (libclang_rt.asan_osx_dynamic.dylib:x86_64+0x56add)
45:     #1 0x10bf22e1d in sys_thread threading.c:181
45:     #2 0x10c0d372f in qd_server_run server.c:1485
45:     #3 0x10bd9996e in main_process main.c:115
45:     #4 0x10bd9827b in main main.c:369
45:     #5 0x7fff63c853d4 in start (libdyld.dylib:x86_64+0x163d4)
45: 
45: ==6605==ABORTING
45: <<<<
45: 
45: ----------------------------------------------------------------------
45: Ran 3 tests in 153.471s
45: 
45: FAILED (errors=2)
45/69 Test #45: system_tests_topology .............................***Failed  153.64 sec

jiridanek avatar Apr 22 '21 08:04 jiridanek

https://issues.apache.org/jira/browse/DISPATCH-1917, already known issue https://travis-ci.com/github/apache/qpid-dispatch/jobs/500384328#L6074

45: ==================
45: WARNING: ThreadSanitizer: data race (pid=17221)
45:   Read of size 8 at 0x7b4400000a00 by thread T3 (mutexes: write M5, write M265, write M0):
45:     #0 strlen <null> (libtsan.so.0+0x32abf)
45:     #1 PyUnicode_FromString <null> (libpython3.9.so.1.0+0x1b291c)
45:     #2 qd_entity_refresh_connector /home/travis/build/apache/qpid-dispatch/src/connection_manager.c:869 (libqpid-dispatch.so+0x70f66)
45:     #3 <null> <null> (libffi.so.7+0x6ff4)
45:     #4 qdr_forward_on_message /home/travis/build/apache/qpid-dispatch/src/router_core/forwarder.c:336 (libqpid-dispatch.so+0xba5bf)
45:     #5 qdr_general_handler /home/travis/build/apache/qpid-dispatch/src/router_core/router_core.c:928 (libqpid-dispatch.so+0xc3209)
45:     #6 qd_timer_visit /home/travis/build/apache/qpid-dispatch/src/timer.c:205 (libqpid-dispatch.so+0xf1267)
45:     #7 handle /home/travis/build/apache/qpid-dispatch/src/server.c:1006 (libqpid-dispatch.so+0xebc5e)
45:     #8 thread_run /home/travis/build/apache/qpid-dispatch/src/server.c:1120 (libqpid-dispatch.so+0xedaf7)
45:     #9 _thread_init /home/travis/build/apache/qpid-dispatch/src/posix/threading.c:172 (libqpid-dispatch.so+0x96006)
45:     #10 <null> <null> (libtsan.so.0+0x2d1af)
45: 
45:   Previous write of size 8 at 0x7b4400000a00 by thread T2:
45:     #0 strcpy <null> (libtsan.so.0+0x45931)
45:     #1 strcpy /usr/include/x86_64-linux-gnu/bits/string_fortified.h:90 (libqpid-dispatch.so+0xe60b6)
45:     #2 AMQP_opened_handler /home/travis/build/apache/qpid-dispatch/src/router_node.c:1388 (libqpid-dispatch.so+0xe60b6)
45:     #3 AMQP_outbound_opened_handler /home/travis/build/apache/qpid-dispatch/src/router_node.c:1549 (libqpid-dispatch.so+0xe6e07)
45:     #4 notify_opened /home/travis/build/apache/qpid-dispatch/src/container.c:277 (libqpid-dispatch.so+0x72a73)
45:     #5 policy_notify_opened /home/travis/build/apache/qpid-dispatch/src/container.c:288 (libqpid-dispatch.so+0x72a73)
45:     #6 qd_policy_amqp_open_connector /home/travis/build/apache/qpid-dispatch/src/policy.c:1373 (libqpid-dispatch.so+0x91b3e)
45:     #7 qd_container_handle_event /home/travis/build/apache/qpid-dispatch/src/container.c:515 (libqpid-dispatch.so+0x7559c)
45:     #8 handle /home/travis/build/apache/qpid-dispatch/src/server.c:1095 (libqpid-dispatch.so+0xebb91)
45:     #9 thread_run /home/travis/build/apache/qpid-dispatch/src/server.c:1120 (libqpid-dispatch.so+0xeda7c)
45:     #10 _thread_init /home/travis/build/apache/qpid-dispatch/src/posix/threading.c:172 (libqpid-dispatch.so+0x96006)
45:     #11 <null> <null> (libtsan.so.0+0x2d1af)
45: 
45:   Location is heap block of size 300 at 0x7b4400000a00 allocated by main thread:
45:     #0 malloc <null> (libtsan.so.0+0x30343)
45:     #1 qd_server_connector /home/travis/build/apache/qpid-dispatch/src/server.c:1658 (libqpid-dispatch.so+0xefb17)
45:     #2 qd_dispatch_configure_connector /home/travis/build/apache/qpid-dispatch/src/connection_manager.c:879 (libqpid-dispatch.so+0x70fc1)
45:     #3 <null> <null> (libffi.so.7+0x6ff4)
45:     #4 main_process /home/travis/build/apache/qpid-dispatch/router/src/main.c:97 (qdrouterd+0x2bb2)
45:     #5 main /home/travis/build/apache/qpid-dispatch/router/src/main.c:369 (qdrouterd+0x2882)
45: 
45:   Mutex M5 (0x7b1000000300) created at:
45:     #0 pthread_mutex_init <null> (libtsan.so.0+0x4a636)
45:     #1 sys_mutex /home/travis/build/apache/qpid-dispatch/src/posix/threading.c:43 (libqpid-dispatch.so+0x9606f)
45:     #2 qd_python_initialize /home/travis/build/apache/qpid-dispatch/src/python_embedded.c:54 (libqpid-dispatch.so+0x98a67)
45:     #3 qd_dispatch /home/travis/build/apache/qpid-dispatch/src/dispatch.c:111 (libqpid-dispatch.so+0x77409)
45:     #4 main_process /home/travis/build/apache/qpid-dispatch/router/src/main.c:92 (qdrouterd+0x2b39)
45:     #5 main /home/travis/build/apache/qpid-dispatch/router/src/main.c:369 (qdrouterd+0x2882)
45: 
45:   Mutex M265 (0x7b1000002c40) created at:
45:     #0 pthread_mutex_init <null> (libtsan.so.0+0x4a636)
45:     #1 sys_mutex /home/travis/build/apache/qpid-dispatch/src/posix/threading.c:43 (libqpid-dispatch.so+0x9606f)
45:     #2 qd_router /home/travis/build/apache/qpid-dispatch/src/router_node.c:1631 (libqpid-dispatch.so+0xe6ffa)
45:     #3 qd_dispatch_prepare /home/travis/build/apache/qpid-dispatch/src/dispatch.c:335 (libqpid-dispatch.so+0x77003)
45:     #4 <null> <null> (libffi.so.7+0x6ff4)
45:     #5 main_process /home/travis/build/apache/qpid-dispatch/router/src/main.c:97 (qdrouterd+0x2bb2)
45:     #6 main /home/travis/build/apache/qpid-dispatch/router/src/main.c:369 (qdrouterd+0x2882)
45: 
45:   Mutex M0 (0x7b1000000200) created at:
45:     #0 pthread_mutex_init <null> (libtsan.so.0+0x4a636)
45:     #1 sys_mutex /home/travis/build/apache/qpid-dispatch/src/posix/threading.c:43 (libqpid-dispatch.so+0x9606f)
45:     #2 qd_entity_cache_initialize /home/travis/build/apache/qpid-dispatch/src/entity_cache.c:55 (libqpid-dispatch.so+0x7878a)
45:     #3 qd_dispatch /home/travis/build/apache/qpid-dispatch/src/dispatch.c:88 (libqpid-dispatch.so+0x772be)
45:     #4 main_process /home/travis/build/apache/qpid-dispatch/router/src/main.c:92 (qdrouterd+0x2b39)
45:     #5 main /home/travis/build/apache/qpid-dispatch/router/src/main.c:369 (qdrouterd+0x2882)
45: 
45:   Thread T3 (tid=17225, running) created by main thread at:
45:     #0 pthread_create <null> (libtsan.so.0+0x5ea99)
45:     #1 sys_thread /home/travis/build/apache/qpid-dispatch/src/posix/threading.c:181 (libqpid-dispatch.so+0x965c2)
45:     #2 qd_server_run /home/travis/build/apache/qpid-dispatch/src/server.c:1485 (libqpid-dispatch.so+0xee56a)
45:     #3 main_process /home/travis/build/apache/qpid-dispatch/router/src/main.c:115 (qdrouterd+0x2c27)
45:     #4 main /home/travis/build/apache/qpid-dispatch/router/src/main.c:369 (qdrouterd+0x2882)
45: 
45:   Thread T2 (tid=17224, running) created by main thread at:
45:     #0 pthread_create <null> (libtsan.so.0+0x5ea99)
45:     #1 sys_thread /home/travis/build/apache/qpid-dispatch/src/posix/threading.c:181 (libqpid-dispatch.so+0x965c2)
45:     #2 qd_server_run /home/travis/build/apache/qpid-dispatch/src/server.c:1485 (libqpid-dispatch.so+0xee56a)
45:     #3 main_process /home/travis/build/apache/qpid-dispatch/router/src/main.c:115 (qdrouterd+0x2c27)
45:     #4 main /home/travis/build/apache/qpid-dispatch/router/src/main.c:369 (qdrouterd+0x2882)
45: 
45: SUMMARY: ThreadSanitizer: data race (/lib/x86_64-linux-gnu/libtsan.so.0+0x32abf) in __interceptor_strlen

jiridanek avatar Apr 22 '21 09:04 jiridanek

@kgiusti No, there are CI failures (crash somewhere in the depths of Python interpreter). I suspect it could be related to https://issues.apache.org/jira/browse/DISPATCH-2144, but I have no proof, could be just wild casting of blame for something I did wrong here.

There is too much undone to resolve for 1.18, I think. Also, the IoAdapter leak is long standing, and it is small and constant amount of memory, so it should not be a significant issue.

jiridanek avatar Nov 01 '21 15:11 jiridanek

For some reason, this is not triggering a TravisCI jobs. I'll mark it as Ready for review, hoping that will do the trick.

In fact, however, this is not ready for review, yet. I intend to

  • [ ] Run this with debug build of cpython interpreter
  • [ ] Fix any issues that show in TravisCI

jiridanek avatar Jan 22 '22 13:01 jiridanek

Few more issues for me to investigate,

21: ::OneRouterTest::test_17_multiframe_presettled FAILED
21: 
21: =================================== FAILURES ===================================
21: _________________ OneRouterTest.test_17_multiframe_presettled __________________
21: 
21: self = <system_tests_one_router.OneRouterTest testMethod=test_17_multiframe_presettled>
21: 
21:     def test_17_multiframe_presettled(self):
21:         test = MultiframePresettledTest(self.address)
21: >       test.run()
21: 
21: /home/jdanek/repos/qpid/qpid-dispatch/tests/system_tests_one_router.py:542: 
21: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
21: /home/jdanek/repos/qpid/qpid-dispatch/tests/system_tests_one_router.py:2884: in run
21:     Container(self).run()
21: /home/jdanek/.cache/pypoetry/virtualenvs/qpid-dispatch-6uYJVB5u-py3.10/lib64/python3.10/site-packages/proton/_reactor.py:180: in run
21:     while self.process():
21: /home/jdanek/.cache/pypoetry/virtualenvs/qpid-dispatch-6uYJVB5u-py3.10/lib64/python3.10/site-packages/proton/_reactor.py:243: in process
21:     event.dispatch(handler)
21: /home/jdanek/.cache/pypoetry/virtualenvs/qpid-dispatch-6uYJVB5u-py3.10/lib64/python3.10/site-packages/proton/_events.py:165: in dispatch
21:     self.dispatch(h, type)
21: /home/jdanek/.cache/pypoetry/virtualenvs/qpid-dispatch-6uYJVB5u-py3.10/lib64/python3.10/site-packages/proton/_events.py:165: in dispatch
21:     self.dispatch(h, type)
21: /home/jdanek/.cache/pypoetry/virtualenvs/qpid-dispatch-6uYJVB5u-py3.10/lib64/python3.10/site-packages/proton/_events.py:162: in dispatch
21:     _dispatch(handler, type.method, self)
21: /home/jdanek/.cache/pypoetry/virtualenvs/qpid-dispatch-6uYJVB5u-py3.10/lib64/python3.10/site-packages/proton/_events.py:123: in _dispatch
21:     m(*args)
21: /home/jdanek/.cache/pypoetry/virtualenvs/qpid-dispatch-6uYJVB5u-py3.10/lib64/python3.10/site-packages/proton/_handlers.py:255: in on_delivery
21:     event.message = recv_msg(dlv)
21: /home/jdanek/.cache/pypoetry/virtualenvs/qpid-dispatch-6uYJVB5u-py3.10/lib64/python3.10/site-packages/proton/_handlers.py:140: in recv_msg
21:     msg.decode(delivery.link.recv(delivery.pending))
21: /home/jdanek/.cache/pypoetry/virtualenvs/qpid-dispatch-6uYJVB5u-py3.10/lib64/python3.10/site-packages/proton/_message.py:499: in decode
21:     self._check(pn_message_decode(self._msg, data))
21: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
21: 
21: self = Message(priority=4), err = -6
21: 
21:     def _check(self, err):
21:         if err < 0:
21:             exc = EXCEPTIONS.get(err, MessageException)
21: >           raise exc("[%s]: %s" % (err, pn_error_text(pn_message_error(self._msg))))
21: E           proton._exceptions.MessageException: [-6]: data error: (null)
21: 
21: /home/jdanek/.cache/pypoetry/virtualenvs/qpid-dispatch-6uYJVB5u-py3.10/lib64/python3.10/site-packages/proton/_message.py:80: MessageException
21: =========================== short test summary info ============================
21: FAILED ::OneRouterTest::test_17_multiframe_presettled - proton._exceptions.Me...
21: !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
21: =================== 1 failed, 24 passed in 77.12s (0:01:17) ====================
    Test #21: system_tests_one_router ..........***Failed   78.27 sec

and https://github.com/jiridanek/qpid-dispatch/runs/4911640230?check_suite_focus=true#step:27:5140

37/38 Test #75: c_unittests .............................***Failed    0.46 sec
[doctest] doctest version is "2.4.7"
[doctest] run with "--help" for options
=================================================================
==3586==ERROR: AddressSanitizer: attempting double-free on 0x60600000b120 in thread T4:
    #0 0x7ffa7bf58627 in free (/lib64/libasan.so.6+0xae627)
    #1 0x935f37 in router_core_thread /home/runner/work/qpid-dispatch/qpid-dispatch/qpid-dispatch/src/router_core/router_core_thread.c:236
    #2 0x7ffa7a877a86 in start_thread (/lib64/libc.so.6+0x8da86)
    #3 0x7ffa7a8fb8d3 in __GI___clone (/lib64/libc.so.6+0x1118d3)

0x60600000b120 is located 0 bytes inside of 56-byte region [0x60600000b120,0x60600000b158)
freed by thread T4 here:
    #0 0x7ffa7bf58627 in free (/lib64/libasan.so.6+0xae627)
    #1 0x938cb7 in qdr_subscribe_CT /home/runner/work/qpid-dispatch/qpid-dispatch/qpid-dispatch/src/router_core/route_tables.c:675
    #2 0x935f37 in router_core_thread /home/runner/work/qpid-dispatch/qpid-dispatch/qpid-dispatch/src/router_core/router_core_thread.c:236
    #3 0x7ffa7a877a86 in start_thread (/lib64/libc.so.6+0x8da86)

previously allocated by thread T3 here:
    #0 0x7ffa7bf5891f in __interceptor_malloc (/lib64/libasan.so.6+0xae91f)
    #1 0x93ed3d in qd_malloc /home/runner/work/qpid-dispatch/qpid-dispatch/qpid-dispatch/include/qpid/dispatch/ctools.h:234
    #2 0x93ed3d in qdr_core_subscribe /home/runner/work/qpid-dispatch/qpid-dispatch/qpid-dispatch/src/router_core/route_tables.c:147
    #3 0x86e872 in IoAdapter_init /home/runner/work/qpid-dispatch/qpid-dispatch/qpid-dispatch/src/python_embedded.c:718
    #4 0x7ffa7b81ce22 in type_call (/lib64/libpython3.10.so.1.0+0x11ae22)

Thread T4 created by T3 here:
    #0 0x7ffa7bf00866 in pthread_create (/lib64/libasan.so.6+0x56866)
    #1 0x86d6a5 in sys_thread /home/runner/work/qpid-dispatch/qpid-dispatch/qpid-dispatch/src/posix/threading.c:181
    #2 0x91be02 in qdr_core /home/runner/work/qpid-dispatch/qpid-dispatch/qpid-dispatch/src/router_core/router_core.c:124
    #3 0x99f402 in qd_router_setup_late /home/runner/work/qpid-dispatch/qpid-dispatch/qpid-dispatch/src/router_node.c:2127
    #4 0x7ffa76707c03 in ffi_call_unix64 (/lib64/libffi.so.6+0x6c03)
    #5 0x7ffa760fc98f  (<unknown module>)

Thread T3 created by T0 here:
    #0 0x7ffa7bf00866 in pthread_create (/lib64/libasan.so.6+0x56866)
    #1 0x7ffa7b534698 in std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) (/lib64/libstdc++.so.6+0xd9698)
    #2 0x519c4d in doctest::Context::run() /home/runner/work/qpid-dispatch/qpid-dispatch/qpid-dispatch/tests/c_unittests/doctest.h:6656
    #3 0x45b88c in main /home/runner/work/qpid-dispatch/qpid-dispatch/qpid-dispatch/tests/c_unittests/doctest.h:6741
    #4 0x7ffa7a81755f in __libc_start_call_main (/lib64/libc.so.6+0x2d55f)

SUMMARY: AddressSanitizer: double-free (/lib64/libasan.so.6+0xae627) in free
==3586==ABORTING

jiridanek avatar Jan 23 '22 09:01 jiridanek

One more, it looks like they are exceedingly rare, but quite devastating

36: E           Process 163489 error: exit code 1, expected 0
36: E           qdrouterd -c INT.B.conf -I /home/jdanek/repos/qpid/qpid-dispatch/python
36: E           /home/jdanek/repos/qpid/qpid-dispatch/cmake-build-debug-asan/tests/system_test.dir/system_tests_interior_sync_up/RouterTest/setUpClass/INT.B-2.cmd
36: E           >>>>
36: E           AddressSanitizer:DEADLYSIGNAL
36: E           =================================================================
36: E           ==163489==ERROR: AddressSanitizer: SEGV on unknown address (pc 0x7f631709656b bp 0x7fff5541b8a0 sp 0x7fff5541b870 T0)
36: E           ==163489==The signal is caused by a READ memory access.
36: E           ==163489==Hint: this fault was caused by a dereference of a high value address (see register values below).  Dissassemble the provided pc to learn which register was used.
36: E               #0 0x7f631709656b in PyVectorcall_Function.lto_priv.5 (/lib64/libpython3.10d.so.1.0+0x12f56b)
36: E               #1 0x7f63170966e1 in _PyObject_VectorcallTstate.lto_priv.5 (/lib64/libpython3.10d.so.1.0+0x12f6e1)
36: E               #2 0x7f6317097df8 in _PyObject_CallFunctionVa (/lib64/libpython3.10d.so.1.0+0x130df8)
36: E               #3 0x7f6317097f49 in PyObject_CallFunction (/lib64/libpython3.10d.so.1.0+0x130f49)
36: E               #4 0x578623 in qd_io_rx_handler /home/jdanek/repos/qpid/qpid-dispatch/src/python_embedded.c:669
36: E               #5 0x612162 in qdr_forward_on_message /home/jdanek/repos/qpid/qpid-dispatch/src/router_core/forwarder.c:335
36: E               #6 0x64c6cb in qdr_general_handler /home/jdanek/repos/qpid/qpid-dispatch/src/router_core/router_core.c:952
36: E               #7 0x6383fe in qdr_core_free /home/jdanek/repos/qpid/qpid-dispatch/src/router_core/router_core.c:272
36: E               #8 0x6d28e2 in qd_router_free /home/jdanek/repos/qpid/qpid-dispatch/src/router_node.c:2165
36: E               #9 0x4ffbdd in qd_dispatch_free /home/jdanek/repos/qpid/qpid-dispatch/src/dispatch.c:375
36: E               #10 0x7067a9 in main_process /home/jdanek/repos/qpid/qpid-dispatch/router/src/main.c:119
36: E               #11 0x708607 in main /home/jdanek/repos/qpid/qpid-dispatch/router/src/main.c:369
36: E               #12 0x7f631639255f in __libc_start_call_main (/lib64/libc.so.6+0x2d55f)
36: E               #13 0x7f631639260b in __libc_start_main_impl (/lib64/libc.so.6+0x2d60b)
36: E               #14 0x42c2e4 in _start (/home/jdanek/repos/qpid/qpid-dispatch/cmake-build-debug-asan/router/qdrouterd+0x42c2e4)
36: E           
36: E           AddressSanitizer can not provide additional info.
36: E           SUMMARY: AddressSanitizer: SEGV (/lib64/libpython3.10d.so.1.0+0x12f56b) in PyVectorcall_Function.lto_priv.5
36: E           ==163489==ABORTING
36: E           <<<<
36: 
36: /home/jdanek/repos/qpid/qpid-dispatch/tests/system_test.py:788: RuntimeError
36: =========================== short test summary info ============================
36: ERROR ::RouterTest::test_interior_sync_up - RuntimeError: Errors during teard...
36: !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
36: ========================= 1 passed, 1 error in 39.23s ==========================
    Test #36: system_tests_interior_sync_up ....***Failed   40.18 sec

jiridanek avatar Jan 23 '22 10:01 jiridanek

This really is crazy. Consider what may happen when both the subscribe and unsubscribe actions get discarded:

==218518==ERROR: AddressSanitizer: attempting double-free on 0x6060000227c0 in thread T3:
    #0 0x14d6627 in free (/lib64/libasan.so.6+0xae627)
    #1 0x6f32e6 in qdr_unsubscribe_CT /home/jdanek/repos/qpid/qpid-dispatch/src/router_core/route_tables.c:691
    #2 0x6c8fe7 in qdr_core_free /home/jdanek/repos/qpid/qpid-dispatch/src/router_core/router_core.c:270
    #3 0x7634f3 in qd_router_free /home/jdanek/repos/qpid/qpid-dispatch/src/router_node.c:2165
    #4 0x5906bd in qd_dispatch_free /home/jdanek/repos/qpid/qpid-dispatch/src/dispatch.c:375
    #5 0x869b8d in QDR::deinitialize(bool) const /home/jdanek/repos/qpid/qpid-dispatch/tests/c_unittests/./helpers.hpp:265
    #6 0x858299 in check_amqp_listener_startup_log_message(qd_server_config_t, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) /home/jdanek/repos/qpid/qpid-dispatch/tests/c_unittests/test_listener_startup.cpp:58
    #7 0x85cd25 in operator() /home/jdanek/repos/qpid/qpid-dispatch/tests/c_unittests/test_listener_startup.cpp:129
    #8 0x863eef in __invoke_impl<void, DOCTEST_ANON_FUNC_28()::<lambda()> > /usr/include/c++/11/bits/invoke.h:61
    #9 0x863cd7 in __invoke<DOCTEST_ANON_FUNC_28()::<lambda()> > /usr/include/c++/11/bits/invoke.h:96
    #10 0x8639de in _M_invoke<0> /usr/include/c++/11/bits/std_thread.h:253
    #11 0x863817 in operator() /usr/include/c++/11/bits/std_thread.h:260
    #12 0x863200 in _M_run /usr/include/c++/11/bits/std_thread.h:211
    #13 0x7f0a01eac5c3 in execute_native_thread_routine (/lib64/libstdc++.so.6+0xd95c3)
    #14 0x440d10670a86 in start_thread (/lib64/libc.so.6+0x8da86)
    #15 0x440d106f48d3 in __GI___clone (/lib64/libc.so.6+0x1118d3)

0x6060000227c0 is located 0 bytes inside of 56-byte region [0x6060000227c0,0x6060000227f8)
freed by thread T3 here:
    #0 0x14d6627 in free (/lib64/libasan.so.6+0xae627)
    #1 0x6f26a9 in qdr_subscribe_CT /home/jdanek/repos/qpid/qpid-dispatch/src/router_core/route_tables.c:675
    #2 0x6c8fe7 in qdr_core_free /home/jdanek/repos/qpid/qpid-dispatch/src/router_core/router_core.c:270
    #3 0x7634f3 in qd_router_free /home/jdanek/repos/qpid/qpid-dispatch/src/router_node.c:2165
    #4 0x5906bd in qd_dispatch_free /home/jdanek/repos/qpid/qpid-dispatch/src/dispatch.c:375
    #5 0x869b8d in QDR::deinitialize(bool) const /home/jdanek/repos/qpid/qpid-dispatch/tests/c_unittests/./helpers.hpp:265
    #6 0x858299 in check_amqp_listener_startup_log_message(qd_server_config_t, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) /home/jdanek/repos/qpid/qpid-dispatch/tests/c_unittests/test_listener_startup.cpp:58
    #7 0x85cd25 in operator() /home/jdanek/repos/qpid/qpid-dispatch/tests/c_unittests/test_listener_startup.cpp:129
    #8 0x863eef in __invoke_impl<void, DOCTEST_ANON_FUNC_28()::<lambda()> > /usr/include/c++/11/bits/invoke.h:61
    #9 0x863cd7 in __invoke<DOCTEST_ANON_FUNC_28()::<lambda()> > /usr/include/c++/11/bits/invoke.h:96
    #10 0x8639de in _M_invoke<0> /usr/include/c++/11/bits/std_thread.h:253
    #11 0x863817 in operator() /usr/include/c++/11/bits/std_thread.h:260
    #12 0x863200 in _M_run /usr/include/c++/11/bits/std_thread.h:211
    #13 0x7f0a01eac5c3 in execute_native_thread_routine (/lib64/libstdc++.so.6+0xd95c3)

previously allocated by thread T3 here:
    #0 0x14d691f in __interceptor_malloc (/lib64/libasan.so.6+0xae91f)
    #1 0x6e68c3 in qd_malloc /home/jdanek/repos/qpid/qpid-dispatch/include/qpid/dispatch/ctools.h:234
    #2 0x6e752c in qdr_core_subscribe /home/jdanek/repos/qpid/qpid-dispatch/src/router_core/route_tables.c:147
    #3 0x609db3 in IoAdapter_init /home/jdanek/repos/qpid/qpid-dispatch/src/python_embedded.c:726
    #4 0x716250fdc15a in type_call (/lib64/libpython3.10d.so.1.0+0x1c415a)
    #5 0x716250f48139 in _PyObject_MakeTpCall (/lib64/libpython3.10d.so.1.0+0x130139)
    #6 0x7162510b3aed in _PyObject_VectorcallTstate.lto_priv.26 (/lib64/libpython3.10d.so.1.0+0x29baed)
    #7 0x7162510b3b69 in PyObject_Vectorcall.lto_priv.1 (/lib64/libpython3.10d.so.1.0+0x29bb69)
    #8 0x7162510ceb04 in call_function (/lib64/libpython3.10d.so.1.0+0x2b6b04)
    #9 0x7162510c83dd in _PyEval_EvalFrameDefault (/lib64/libpython3.10d.so.1.0+0x2b03dd)
    #10 0x7162510b3e1d in _PyEval_EvalFrame.lto_priv.1 (/lib64/libpython3.10d.so.1.0+0x29be1d)
    #11 0x7162510cc8b5 in _PyEval_Vector (/lib64/libpython3.10d.so.1.0+0x2b48b5)
    #12 0x716250f487df in _PyFunction_Vectorcall (/lib64/libpython3.10d.so.1.0+0x1307df)
    #13 0x7162510b3b09 in _PyObject_VectorcallTstate.lto_priv.26 (/lib64/libpython3.10d.so.1.0+0x29bb09)
    #14 0x7162510b3b69 in PyObject_Vectorcall.lto_priv.1 (/lib64/libpython3.10d.so.1.0+0x29bb69)
    #15 0x7162510ceb04 in call_function (/lib64/libpython3.10d.so.1.0+0x2b6b04)
    #16 0x7162510c8208 in _PyEval_EvalFrameDefault (/lib64/libpython3.10d.so.1.0+0x2b0208)
    #17 0x7162510b3e1d in _PyEval_EvalFrame.lto_priv.1 (/lib64/libpython3.10d.so.1.0+0x29be1d)
    #18 0x7162510cc8b5 in _PyEval_Vector (/lib64/libpython3.10d.so.1.0+0x2b48b5)
    #19 0x716250f487df in _PyFunction_Vectorcall (/lib64/libpython3.10d.so.1.0+0x1307df)
    #20 0x716250f47737 in _PyObject_VectorcallTstate.lto_priv.5 (/lib64/libpython3.10d.so.1.0+0x12f737)
    #21 0x716250f48dd1 in _PyObject_CallFunctionVa (/lib64/libpython3.10d.so.1.0+0x130dd1)
    #22 0x716250f48f49 in PyObject_CallFunction (/lib64/libpython3.10d.so.1.0+0x130f49)
    #23 0x58deb7 in qd_dispatch_load_config /home/jdanek/repos/qpid/qpid-dispatch/src/dispatch.c:130
    #24 0x8689d4 in QDR::initialize(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /home/jdanek/repos/qpid/qpid-dispatch/tests/c_unittests/./helpers.hpp:215
    #25 0x857d9d in check_amqp_listener_startup_log_message(qd_server_config_t, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) /home/jdanek/repos/qpid/qpid-dispatch/tests/c_unittests/test_listener_startup.cpp:44
    #26 0x85cd25 in operator() /home/jdanek/repos/qpid/qpid-dispatch/tests/c_unittests/test_listener_startup.cpp:129
    #27 0x863eef in __invoke_impl<void, DOCTEST_ANON_FUNC_28()::<lambda()> > /usr/include/c++/11/bits/invoke.h:61
    #28 0x863cd7 in __invoke<DOCTEST_ANON_FUNC_28()::<lambda()> > /usr/include/c++/11/bits/invoke.h:96
    #29 0x8639de in _M_invoke<0> /usr/include/c++/11/bits/std_thread.h:253

Thread T3 created by T0 here:
    #0 0x147e866 in pthread_create (/lib64/libasan.so.6+0x56866)
    #1 0x7f0a01eac698 in std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) (/lib64/libstdc++.so.6+0xd9698)
    #2 0x85cfad in DOCTEST_ANON_FUNC_28 /home/jdanek/repos/qpid/qpid-dispatch/tests/c_unittests/test_listener_startup.cpp:122
    #3 0x7fe4a6 in doctest::Context::run() /home/jdanek/repos/qpid/qpid-dispatch/tests/c_unittests/doctest.h:6656
    #4 0x801ba8 in main /home/jdanek/repos/qpid/qpid-dispatch/tests/c_unittests/doctest.h:6741
    #5 0x440d1061055f in __libc_start_call_main (/lib64/libc.so.6+0x2d55f)

SUMMARY: AddressSanitizer: double-free (/lib64/libasan.so.6+0xae627) in free
==218518==ABORTING

jiridanek avatar Jan 23 '22 15:01 jiridanek

Created spinoff PRs https://github.com/apache/qpid-dispatch/pull/1495 and https://github.com/apache/qpid-dispatch/pull/1496

jiridanek avatar Jan 25 '22 15:01 jiridanek

Marking as draft and dropping it from 1.19 milestone. The PR fixes the leaks, but causes occasional shutdown crashes, which is not acceptable.

  • [ ] don't forget to put the following system_test.py addition to skupper-router
        super(Http2Server, self).__init__(self.args, name=name, expect=expect)
        if wait:
            try:
                self.wait_ready()
            except Exception:
                self.teardown()
                raise

jiridanek avatar Feb 17 '22 01:02 jiridanek