qpid-dispatch icon indicating copy to clipboard operation
qpid-dispatch copied to clipboard

WIP: unittest for starting router and adding autoLink

Open jiridanek opened this issue 4 years ago • 15 comments

This is a quick work-in-progress attempt that I've created.

First, majority of functions in the router require that the memory pools are initialized. I've decided to just start the whole router to get that. It may be possible to initialize less of the router, which I will explore later.

Being able to initialize less will be also good for timed unittests, essentially small benchmarks, around for example, measuring how long adding autoLinks takes, and so on.

Starting and stopping router has one small caveat. The way I do it now, if I shut down the router too soon after starting it, things break (I get leaks and/or segfaults). So I had to add a 500ms wait. There has to be a way to detect that startup is done, but I haven't found it yet.

I need to redirect logging in the test. I think it should not be difficult to accomplish, but I did not look into it yet. Looking at the logs is going to be the way to detect problems, because the router code often logs thing right away, and does not return error information in return codes. So this has to be done well, because it will be used often.

Regarding the specific tests, I was thinking about the following issues

  • "Adding new config address, autolinks and link routes become slower as more get added"
  • "Deliveries with modified state are changed when relayed to a broker"

This is what the code in the PR prints. The first is from trying to start the broker twice, and the last is from adding a linkRoute. There is no separation between, but it is mostly three times the same thing. Not something that should be printed during regular runs of the test, it's there for now just to show something.

70: Test command: /nix/store/r94aa2gj4drkhfvkm2p4ab6cblb6kxlq-python3-3.7.6/bin/python "/home/jdanek/repos/qpid/qpid-dispatch/cmake-build-debug/tests/run.py" "/home/jdanek/repos/qpid/qpid-dispatch/cmake-build-debug/tests/c_unittests/c_unittests"
70: Test timeout computed to be: 600
70: Container Name: 0
70: Container Initialized
70: Node Type Registered - router
70: Node of type 'router' installed as default node
70: Router started in Endpoint mode
70: Version: 1.15.0-SNAPSHOT
70: Policy Initialized
70: Core module present but disabled: edge_router
70: Core module present but disabled: core_test_hooks
70: Core module present but disabled: edge_addr_tracking
70: Core module present but disabled: address_lookup_server
70: Core module enabled: address_lookup_client
70: Stuck delivery detection: Scan interval: 30 seconds, Delivery age threshold: 10 seconds
70: Core module enabled: stuck_delivery_detection
70: Core module present but disabled: mobile_sync
70: Streaming link scrubber: Scan interval: 30 seconds, max free pool: 128 links
70: Core module enabled: streaming_link_scruber
70: Router Core thread running. 0/0
70: Core action 'subscribe'
70: In-process subscription M/$management
70: Core action 'subscribe'
70: In-process subscription L/$management
70: Default node removed
70: Router Core thread exited
70: Finalizing core module: streaming_link_scruber
70: Finalizing core module: stuck_delivery_detection
70: Finalizing core module: address_lookup_client
70: [doctest] doctest version is "2.3.6"
70: [doctest] run with "--help" for options
70: Container Name: 0
70: Container Initialized
70: Node of type 'router' installed as default node
70: Router started in Endpoint mode
70: Version: 1.15.0-SNAPSHOT
70: Policy Initialized
70: Core module present but disabled: edge_router
70: Core module present but disabled: core_test_hooks
70: Core module present but disabled: edge_addr_tracking
70: Core module present but disabled: address_lookup_server
70: Core module enabled: address_lookup_client
70: Stuck delivery detection: Scan interval: 30 seconds, Delivery age threshold: 10 seconds
70: Core module enabled: stuck_delivery_detection
70: Core module present but disabled: mobile_sync
70: Streaming link scrubber: Scan interval: 30 seconds, max free pool: 128 links
70: Core module enabled: streaming_link_scruber
70: Router Core thread running. 0/0
70: Core action 'subscribe'
70: In-process subscription M/$management
70: Core action 'subscribe'
70: In-process subscription L/$management
70: Default node removed
70: Router Core thread exited
70: Finalizing core module: streaming_link_scruber
70: Finalizing core module: stuck_delivery_detection
70: Finalizing core module: address_lookup_client
70: Container Name: 0
70: Container Initialized
70: Node of type 'router' installed as default node
70: Router started in Endpoint mode
70: Version: 1.15.0-SNAPSHOT
70: Policy Initialized
70: Core module present but disabled: edge_router
70: Core module present but disabled: core_test_hooks
70: Core module present but disabled: edge_addr_tracking
70: Core module present but disabled: address_lookup_server
70: Core module enabled: address_lookup_client
70: Stuck delivery detection: Scan interval: 30 seconds, Delivery age threshold: 10 seconds
70: Core module enabled: stuck_delivery_detection
70: Core module present but disabled: mobile_sync
70: Streaming link scrubber: Scan interval: 30 seconds, max free pool: 128 links
70: Core module enabled: streaming_link_scruber
70: Router Core thread running. 0/0
70: Core action 'subscribe'
70: In-process subscription M/$management
70: Core action 'subscribe'
70: In-process subscription L/$management
70: Error performing CREATE of org.apache.qpid.dispatch.router.config.autoLink: Body of request must be a map
70: Error configuring linkRoute: Body of request must be a map
70: ===============================================================================
70: [doctest] test cases:      5 |      5 passed |      0 failed |      0 skipped
70: [doctest] assertions:     36 |     36 passed |      0 failed |
70: [doctest] Status: SUCCESS!

100% tests passed, 0 tests failed out of 1

Total Test time (real) =   1.81 sec
Process finished with exit code 0

jiridanek avatar Oct 15 '20 13:10 jiridanek

And @nicob87, what do you thing?

jiridanek avatar Oct 15 '20 13:10 jiridanek

Some progress, I can give it a map at last

69: Error performing CREATE of org.apache.qpid.dispatch.router.config.autoLink: addr and direction fields are mandatory
69: Error configuring linkRoute: addr and direction fields are mandatory

jiridanek avatar Oct 19 '20 08:10 jiridanek

Its working, its working... more or less. The test to add autoLink through management message really adds an autoLink, and can do it without any network communication! And I cannot reduce scope of the test any further, because then I'd hit static functions and types that are defined in .c files, so I cannot see inside. More work needed, but it starts to look hopeful.

jiridanek avatar Oct 20 '20 18:10 jiridanek

Faster router startup using BetterRouterStartupLatch to wait for the core thread. It is a bit convoluted, because C function pointers do not work with C++ capturing lambdas (C/C++ does not have dleegates like C# does), but it seems to work much better than the 500ms sleep.

New test that tries to send message through router. There is apparently something crucial missing, because I get leaks. I have to yet figure out what I've actually accompished through calling qdr_link_deliver.

alloc.c: Items of type 'qd_bitmask_t' remain allocated at shutdown: 1 (SUPPRESSED)
Leak: 2020-10-22 22:22:16.795683 +0200 type: qd_bitmask_t address: 0x611000043790
/nix/store/zhbhxp4jgalycwffildy2bwcgmrsmayk-gcc-9.2.0-lib/lib/libasan.so.5(+0x6cc51) [0x7f937b3c0c51]
/home/jdanek/repos/qpid/qpid-dispatch/cmake-build-debug/src/libqpid-dispatch.so(qd_alloc+0xf5d) [0x7f937abe396a]
/home/jdanek/repos/qpid/qpid-dispatch/cmake-build-debug/src/libqpid-dispatch.so(new_qd_bitmask_t+0x26) [0x7f937abecfcf]
/home/jdanek/repos/qpid/qpid-dispatch/cmake-build-debug/src/libqpid-dispatch.so(qd_bitmask+0x10) [0x7f937abed2ce]
/home/jdanek/repos/qpid/qpid-dispatch/cmake-build-debug/tests/c_unittests/c_unittests() [0x4fa981]
/home/jdanek/repos/qpid/qpid-dispatch/cmake-build-debug/tests/c_unittests/c_unittests() [0x4fe213]
/home/jdanek/repos/qpid/qpid-dispatch/cmake-build-debug/tests/c_unittests/c_unittests() [0x4fe047]
/home/jdanek/repos/qpid/qpid-dispatch/cmake-build-debug/tests/c_unittests/c_unittests() [0x4fddb3]
/home/jdanek/repos/qpid/qpid-dispatch/cmake-build-debug/tests/c_unittests/c_unittests() [0x4fdc71]
/home/jdanek/repos/qpid/qpid-dispatch/cmake-build-debug/tests/c_unittests/c_unittests() [0x4fdb1b]

alloc.c: Items of type 'qdr_delivery_ref_t' remain allocated at shutdown: 1 (SUPPRESSED)
Leak: 2020-10-22 22:22:16.795976 +0200 type: qdr_delivery_ref_t address: 0x61100004aa90
/nix/store/zhbhxp4jgalycwffildy2bwcgmrsmayk-gcc-9.2.0-lib/lib/libasan.so.5(+0x6cc51) [0x7f937b3c0c51]
/home/jdanek/repos/qpid/qpid-dispatch/cmake-build-debug/src/libqpid-dispatch.so(qd_alloc+0x2ccb) [0x7f937abe56d8]
/home/jdanek/repos/qpid/qpid-dispatch/cmake-build-debug/src/libqpid-dispatch.so(new_qdr_delivery_ref_t+0x26) [0x7f937ad4e537]
/home/jdanek/repos/qpid/qpid-dispatch/cmake-build-debug/src/libqpid-dispatch.so(qdr_add_delivery_ref_CT+0x16) [0x7f937ad66020]
/home/jdanek/repos/qpid/qpid-dispatch/cmake-build-debug/src/libqpid-dispatch.so(qdr_delivery_push_CT+0x22f) [0x7f937ad11163]
/home/jdanek/repos/qpid/qpid-dispatch/cmake-build-debug/src/libqpid-dispatch.so(qdr_delivery_release_CT+0x238) [0x7f937ad0564e]
/home/jdanek/repos/qpid/qpid-dispatch/cmake-build-debug/src/libqpid-dispatch.so(+0x57d84a) [0x7f937ad8e84a]
/home/jdanek/repos/qpid/qpid-dispatch/cmake-build-debug/src/libqpid-dispatch.so(+0x581ffb) [0x7f937ad92ffb]
/home/jdanek/repos/qpid/qpid-dispatch/cmake-build-debug/src/libqpid-dispatch.so(router_core_thread+0x1009) [0x7f937ad6f247]
/home/jdanek/repos/qpid/qpid-dispatch/cmake-build-debug/src/libqpid-dispatch.so(+0x48230d) [0x7f937ac9330d]

alloc.c: Items of type 'qdr_link_work_t' remain allocated at shutdown: 1 (SUPPRESSED)
Leak: 2020-10-22 22:22:16.796097 +0200 type: qdr_link_work_t address: 0x612000035f50
/nix/store/zhbhxp4jgalycwffildy2bwcgmrsmayk-gcc-9.2.0-lib/lib/libasan.so.5(+0x6cc51) [0x7f937b3c0c51]
/home/jdanek/repos/qpid/qpid-dispatch/cmake-build-debug/src/libqpid-dispatch.so(qd_alloc+0x2ccb) [0x7f937abe56d8]
/home/jdanek/repos/qpid/qpid-dispatch/cmake-build-debug/src/libqpid-dispatch.so(new_qdr_link_work_t+0x26) [0x7f937ad4f773]
/home/jdanek/repos/qpid/qpid-dispatch/cmake-build-debug/src/libqpid-dispatch.so(qdr_link_issue_credit_CT+0x8ce) [0x7f937ad952bb]
/home/jdanek/repos/qpid/qpid-dispatch/cmake-build-debug/src/libqpid-dispatch.so(+0x57db19) [0x7f937ad8eb19]
/home/jdanek/repos/qpid/qpid-dispatch/cmake-build-debug/src/libqpid-dispatch.so(+0x581ffb) [0x7f937ad92ffb]
/home/jdanek/repos/qpid/qpid-dispatch/cmake-build-debug/src/libqpid-dispatch.so(router_core_thread+0x1009) [0x7f937ad6f247]
/home/jdanek/repos/qpid/qpid-dispatch/cmake-build-debug/src/libqpid-dispatch.so(+0x48230d) [0x7f937ac9330d]
/nix/store/xg6ilb9g9zhi2zg1dpi4zcp288rhnvns-glibc-2.30/lib/libpthread.so.0(+0x7edd) [0x7f937b33aedd]
/nix/store/xg6ilb9g9zhi2zg1dpi4zcp288rhnvns-glibc-2.30/lib/libc.so.6(clone+0x3f) [0x7f9379629aaf]

ERROR: Aborted due to unexpected alloc pool leak of type 'qdr_link_work_t'
Process finished with exit code 1

jiridanek avatar Oct 22 '20 20:10 jiridanek

And @nicob87, what do you thing?

I really like the idea!

nicob87 avatar Oct 22 '20 21:10 nicob87

@nicob87 Idea is probably good, but the implementation may not be ideal, yet.

Setup for this kind of a test takes about 80 ms, because it starts most of the router, and initializes Python. That may be too slow to initialize new router for every single test. But the point of all this is to have isolated tests.

I am not sure how much of C++ features I should be using. It can simplify things, but it is a different language than C, with all the problems that come from that. The uniqe_ptr with deleters looks really nice, if it can be wrapped in something less verbose than what's in the code fragment below. Creating fake messages would also be much easier with a C++ wrapper over the C API.

Doing some of the setup is lengthy and it somewhat duplicates the production code. Meaning it would complicate keeping tests in sync with production code. See for example what it looks like if I want to initialize qdr_link_t myself,

https://github.com/apache/qpid-dispatch/pull/880/files#diff-da66f2598384728900547ac702d364f6df8b8253fb86d607bfc6be369494e5dfR225-R228

Doing it through qdr_create_link_CT would of course make things simpler; maybe that is the right way to go.

Some things cannot be even initialized directly, because there are opaque C structs. In a different type of a test (where I'd include specific C files I need, and not the whole dispatch so library, I could include the needed C files... This can be solved by using the _private.h convention more, so it's probably not a problem.

jiridanek avatar Oct 26 '20 10:10 jiridanek

It looks like it's necessary to use qdr_connection_opened to create a new qdr_connection_t, qdr_create_link_CT to get a qdr_link_t, and so on. These functions set things in the qdr_router_t struct and it is pointless trying to duplicate all the side-effects in the test code.

I'm not sure what the next step in the test should be, but hopefully I'll make some sense of it later.

If there is to be multiple tests like this, I'll need some helper methods to create the links, etc; instead of using the all those methods with 10+ arguments directly. I am still not sure that what I am now trying is the correct level on which to write such a test.

jiridanek avatar Dec 14 '20 09:12 jiridanek

and with that, c_unittests has passed on this branch, without failures and leaks

jiridanek avatar Dec 14 '20 14:12 jiridanek

Here's a microbenchmark for TCP Echo server latency with and without Dispatch

Debug build:

----------------------------------------------------------------------------
Benchmark                                  Time             CPU   Iterations
----------------------------------------------------------------------------
BM_TCPEchoServerLatency1QDR            0.313 ms        0.032 ms        10000
BM_TCPEchoServerLatencyWithoutQDR      0.029 ms        0.018 ms        35970

that's on top of e00f88adef Fernando Giorgetti Date:   Fri Jan 29 03:51:31 2021 -0300 DISPATCH-1870 - gRPC system test (#944)

Debug build:

----------------------------------------------------------------------------
Benchmark                                  Time             CPU   Iterations
----------------------------------------------------------------------------
BM_TCPEchoServerLatency1QDR            0.295 ms        0.031 ms        10000
BM_TCPEchoServerLatencyWithoutQDR      0.030 ms        0.018 ms        32067

Release build:

----------------------------------------------------------------------------
Benchmark                                  Time             CPU   Iterations
----------------------------------------------------------------------------
BM_TCPEchoServerLatency1QDR            0.335 ms        0.031 ms        10000
BM_TCPEchoServerLatencyWithoutQDR      0.027 ms        0.017 ms        35902

on top of current master, commit a0451227c Ganesh Murthy Date:   Thu Feb 25 09:58:48 2021 -0500 DISPATCH-1973: Do not record the http request without a record key.

jiridanek avatar Feb 27 '21 19:02 jiridanek

Note to self: probably something I introduced here. I never saw this on master

https://github.com/apache/qpid-dispatch/runs/1995248602?check_suite_focus=true#step:9:755

19: Router solo output file:
19: >>>>
19: ../src/parse.c:497:66: runtime error: left shift of 128 by 56 places cannot be represented in type 'long int'
19: 
19: =================================================================
19: ==2208==ERROR: LeakSanitizer: detected memory leaks
19: 
19: Direct leak of 14 byte(s) in 1 object(s) allocated from:
19:     #0 0x7fd6c33383dd in strdup (/lib/x86_64-linux-gnu/libasan.so.5+0x963dd)
19:     #1 0x7fd6c2ce39c0 in py_string_2_c ../src/python_utils.c:35
19:     #2 0x7fd6c2ce39c0 in py_string_2_c ../src/python_utils.c:25
19:     #3 0x7fd6c2b2a0c1 in qd_entity_get_string ../src/entity.c:49
19:     #4 0x7fd6c2b27114 in qd_dispatch_configure_router ../src/dispatch.c:217
19:     #5 0x7fd6bd80bff4  (/lib/x86_64-linux-gnu/libffi.so.7+0x6ff4)
19:     #6 0x7fffb0aa8b7f  ([stack]+0x1eb7f)
19: 
19: -----------------------------------------------------
19: Suppressions used:
19:   count      bytes template
19:       1         56 qdr_core_subscribe
19:     589     901772 *libpython*
19: -----------------------------------------------------
19: 
19: SUMMARY: AddressSanitizer: 14 byte(s) leaked in 1 allocation(s).

jiridanek avatar Feb 27 '21 21:02 jiridanek

See the 7 other warnings above this one, they come from doctest itself ;(

https://travis-ci.com/github/apache/qpid-dispatch/jobs/498901258#L9490

74: WARNING: ThreadSanitizer: data race (pid=18096)
74:   Write of size 8 at 0x7b6400088018 by thread T9 (mutexes: write M1482):
74:     #0 qdr_connection_activate_CT /home/travis/build/apache/qpid-dispatch/src/router_core/connections.c:703 (libqpid-dispatch.so+0xa738b)
74:     #1 qdr_connection_activate_CT /home/travis/build/apache/qpid-dispatch/src/router_core/connections.c:700 (libqpid-dispatch.so+0xa738b)
74:     #2 qdr_connection_enqueue_work_CT /home/travis/build/apache/qpid-dispatch/src/router_core/connections.c:719 (libqpid-dispatch.so+0xa75d3)
74:     #3 qdr_create_link_CT /home/travis/build/apache/qpid-dispatch/src/router_core/connections.c:1163 (libqpid-dispatch.so+0xa7ccf)
74:     #4 operator() /home/travis/build/apache/qpid-dispatch/tests/c_unittests/test_waypoint_undeliverable.cpp:291 (c_unittests+0x43790)
74:     #5 __invoke_impl<void, _DOCTEST_ANON_FUNC_2()::<lambda()> > /usr/include/c++/9/bits/invoke.h:60 (c_unittests+0x4415b)
74:     #6 __invoke<_DOCTEST_ANON_FUNC_2()::<lambda()> > /usr/include/c++/9/bits/invoke.h:95 (c_unittests+0x4415b)
74:     #7 _M_invoke<0> /usr/include/c++/9/thread:244 (c_unittests+0x4415b)
74:     #8 operator() /usr/include/c++/9/thread:251 (c_unittests+0x4415b)
74:     #9 _M_run /usr/include/c++/9/thread:195 (c_unittests+0x4415b)
74:     #10 <null> <null> (libstdc++.so.6+0xd6d83)
74: 
74:   Previous read of size 8 at 0x7b6400088018 by thread T10:
74:     #0 qdr_activate_connections_CT /home/travis/build/apache/qpid-dispatch/src/router_core/router_core_thread.c:86 (libqpid-dispatch.so+0xc8416)
74:     #1 router_core_thread /home/travis/build/apache/qpid-dispatch/src/router_core/router_core_thread.c:248 (libqpid-dispatch.so+0xc8416)
74:     #2 _thread_init /home/travis/build/apache/qpid-dispatch/src/posix/threading.c:174 (libqpid-dispatch.so+0x95026)
74:     #3 <null> <null> (libtsan.so.0+0x2d1af)
74: 
74:   Location is heap block of size 1160 at 0x7b6400087f00 allocated by thread T9:
74:     #0 calloc <null> (libtsan.so.0+0x305ca)
74:     #1 qd_malloc /home/travis/build/apache/qpid-dispatch/include/qpid/dispatch/ctools.h:229 (libqpid-dispatch.so+0xc27d8)
74:     #2 qdr_core /home/travis/build/apache/qpid-dispatch/src/router_core/router_core.c:71 (libqpid-dispatch.so+0xc27d8)
74:     #3 qd_router_setup_late /home/travis/build/apache/qpid-dispatch/src/router_node.c:2111 (libqpid-dispatch.so+0xe601c)
74:     #4 <null> <null> (libffi.so.7+0x6ff4)
74:     #5 QDR::start() /home/travis/build/apache/qpid-dispatch/tests/c_unittests/helpers.hpp:132 (c_unittests+0x3c414)
74:     #6 operator() /home/travis/build/apache/qpid-dispatch/tests/c_unittests/test_waypoint_undeliverable.cpp:189 (c_unittests+0x4283a)
74:     #7 __invoke_impl<void, _DOCTEST_ANON_FUNC_2()::<lambda()> > /usr/include/c++/9/bits/invoke.h:60 (c_unittests+0x4415b)
74:     #8 __invoke<_DOCTEST_ANON_FUNC_2()::<lambda()> > /usr/include/c++/9/bits/invoke.h:95 (c_unittests+0x4415b)
74:     #9 _M_invoke<0> /usr/include/c++/9/thread:244 (c_unittests+0x4415b)
74:     #10 operator() /usr/include/c++/9/thread:251 (c_unittests+0x4415b)
74:     #11 _M_run /usr/include/c++/9/thread:195 (c_unittests+0x4415b)
74:     #12 <null> <null> (libstdc++.so.6+0xd6d83)
74: 
74:   Mutex M1482 (0x7f6d17cbd520) created at:
74:     #0 pthread_mutex_lock <null> (libtsan.so.0+0x5271c)
74:     #1 __gthread_mutex_lock /usr/include/x86_64-linux-gnu/c++/9/bits/gthr-default.h:749 (c_unittests+0x4288c)
74:     #2 std::mutex::lock() /usr/include/c++/9/bits/std_mutex.h:100 (c_unittests+0x4288c)
74:     #3 BetterRouterStartupLatch::wait_for_qdr(qd_dispatch_t*) /home/travis/build/apache/qpid-dispatch/tests/c_unittests/./helpers.hpp:91 (c_unittests+0x4288c)
74:     #4 QDR::wait() const /home/travis/build/apache/qpid-dispatch/tests/c_unittests/./helpers.hpp:147 (c_unittests+0x4288c)
74:     #5 operator() /home/travis/build/apache/qpid-dispatch/tests/c_unittests/test_waypoint_undeliverable.cpp:190 (c_unittests+0x4288c)
74:     #6 __invoke_impl<void, _DOCTEST_ANON_FUNC_2()::<lambda()> > /usr/include/c++/9/bits/invoke.h:60 (c_unittests+0x4415b)
74:     #7 __invoke<_DOCTEST_ANON_FUNC_2()::<lambda()> > /usr/include/c++/9/bits/invoke.h:95 (c_unittests+0x4415b)
74:     #8 _M_invoke<0> /usr/include/c++/9/thread:244 (c_unittests+0x4415b)
74:     #9 operator() /usr/include/c++/9/thread:251 (c_unittests+0x4415b)
74:     #10 _M_run /usr/include/c++/9/thread:195 (c_unittests+0x4415b)
74:     #11 <null> <null> (libstdc++.so.6+0xd6d83)
74: 
74:   Thread T9 (tid=18106, running) created by main thread at:
74:     #0 pthread_create <null> (libtsan.so.0+0x5ea99)
74:     #1 std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) <null> (libstdc++.so.6+0xd7048)
74:     #2 doctest::Context::run() /home/travis/build/apache/qpid-dispatch/tests/c_unittests/doctest.h:6486 (c_unittests+0x30ce8)
74:     #3 main /home/travis/build/apache/qpid-dispatch/tests/c_unittests/doctest.h:6571 (c_unittests+0x126d9)
74: 
74:   Thread T10 (tid=18107, running) created by thread T9 at:
74:     #0 pthread_create <null> (libtsan.so.0+0x5ea99)
74:     #1 sys_thread /home/travis/build/apache/qpid-dispatch/src/posix/threading.c:183 (libqpid-dispatch.so+0x955e2)
74:     #2 qdr_core /home/travis/build/apache/qpid-dispatch/src/router_core/router_core.c:122 (libqpid-dispatch.so+0xc2ade)
74:     #3 qd_router_setup_late /home/travis/build/apache/qpid-dispatch/src/router_node.c:2111 (libqpid-dispatch.so+0xe601c)
74:     #4 <null> <null> (libffi.so.7+0x6ff4)
74:     #5 QDR::start() /home/travis/build/apache/qpid-dispatch/tests/c_unittests/helpers.hpp:132 (c_unittests+0x3c414)
74:     #6 operator() /home/travis/build/apache/qpid-dispatch/tests/c_unittests/test_waypoint_undeliverable.cpp:189 (c_unittests+0x4283a)
74:     #7 __invoke_impl<void, _DOCTEST_ANON_FUNC_2()::<lambda()> > /usr/include/c++/9/bits/invoke.h:60 (c_unittests+0x4415b)
74:     #8 __invoke<_DOCTEST_ANON_FUNC_2()::<lambda()> > /usr/include/c++/9/bits/invoke.h:95 (c_unittests+0x4415b)
74:     #9 _M_invoke<0> /usr/include/c++/9/thread:244 (c_unittests+0x4415b)
74:     #10 operator() /usr/include/c++/9/thread:251 (c_unittests+0x4415b)
74:     #11 _M_run /usr/include/c++/9/thread:195 (c_unittests+0x4415b)
74:     #12 <null> <null> (libstdc++.so.6+0xd6d83)
74: 
74: SUMMARY: ThreadSanitizer: data race /home/travis/build/apache/qpid-dispatch/src/router_core/connections.c:703 in qdr_connection_activate_CT

jiridanek avatar Apr 15 '21 22:04 jiridanek

Codecov Report

Merging #880 (d8ee68e) into main (7ef0a0f) will decrease coverage by 3.76%. The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #880      +/-   ##
==========================================
- Coverage   84.29%   80.53%   -3.77%     
==========================================
  Files         112      113       +1     
  Lines       27694    29017    +1323     
==========================================
+ Hits        23346    23369      +23     
- Misses       4348     5648    +1300     
Impacted Files Coverage Δ
src/alloc_pool.c 93.72% <100.00%> (+0.04%) :arrow_up:
src/connection_manager.c 90.14% <100.00%> (+0.01%) :arrow_up:
src/dispatch.c 86.66% <100.00%> (+2.93%) :arrow_up:
src/entity_cache.c 100.00% <100.00%> (ø)
src/log.c 91.29% <100.00%> (+0.02%) :arrow_up:
src/amqp.c 0.00% <0.00%> (-100.00%) :arrow_down:
src/router_core/router_core.c 86.26% <0.00%> (-1.14%) :arrow_down:
src/router_core/transfer.c 93.49% <0.00%> (-0.66%) :arrow_down:
src/adaptors/tcp_adaptor.c 70.17% <0.00%> (-0.59%) :arrow_down:
src/router_core/delivery.c 93.14% <0.00%> (-0.19%) :arrow_down:
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 7ef0a0f...d8ee68e. Read the comment docs.

codecov-commenter avatar Apr 17 '21 07:04 codecov-commenter

flamegraph

jiridanek avatar May 07 '21 22:05 jiridanek

I tried writing a test for shutdown crash with connected websocket in both Python unittest and C++ doctest. The Python unittest runs cca 600 ms start to finish, the C++ doctest about 400 ms. I did not investigate deeper, but I guess this is due to extra instance of Python being started (to run the tests) and due to the facts the system-tests framework does extra work that involves I/O (trying for open socket, checking if router started up, ...)

There is very little that running things in-process can improve on what the Python tests do here. The little things will likely add up, but not enough to e.g. warrant rewriting tests from unittest to doctest. The wins have to be found elsewhere:

  • ability to avoid bringing up and shutting down python and alloc pool repeatedly; is it possible to run realistic scenarios while keeping python around? Possibly doing some checks to ensure there are not undesirable Python objects surviving from one test to the next one. It would need experimentation to see if this is doable and actual measurements to see if this is worth it. Alloc pool is actually very fast to start up. ** given that the test crashes in ci on segv in lws_get_vhost_port, means I don't properly wait for the listener to go up; so maybe the c++ doctest will run somewhat slower when I add the proper check/wait (not sure how to do this yet)
  • write integration tests in doctest, instead of testing things end-to-end (leave that for the system-tests). Is it possible to do useful testing involving multiple components without starting the entire router?

jiridanek avatar May 09 '21 10:05 jiridanek

That Python test I added last time (the one using asyncio websocket lib) can actually be added upstream now, since Python 3.6 supports async. I looked a bit into simplifying multithreaded programming in C++, but there does not seem to be much. There are the regular primitives in C++11 std lib (https://freecontent.manning.com/synchronizing-concurrent-operations-in-c/), everything else has to be built on top; there are some async/await features coming in C++20 or maybe in the version after that.

As an off-topic, there is a person building immutable tree-based lists and other datastructures for C++, this is quite interesting, but it would be hard to use from C https://github.com/arximboldi/immer

jiridanek avatar May 19 '21 10:05 jiridanek