[BUG]: Crash of RoutingManagerd version 3.4.10
vSomeip Version
v.3.4.10
Boost Version
1.78.0
Environment
Target: Test bench with automated test running OS: Embedded Linux
Describe the bug
During the testing activities we observed several (around 6 times) crashes of routingmanagerd.
- routingmanagerd core dumped with SIGSEGV, Segmentation fault
- routingmanagerd core dumped with SIGABRT, Aborted
Details in provided back-traces.
Reproduction Steps
Several hundred (300-400) Loop test on the target by running various applications.
Expected behaviour
routingmanagerd should not crash.
Logs and Screenshots
No response
The coredump back-traces: core.routingmanagerd.850.b00df2d309b24a858643df5f2ac79195.3013.11.1717956167.log core.da00_io01.682.42f424ddae96409e80ba1b59908db2d4.3013.11.1718020067.log core.0100_io01.735.d97c5634e6204b199f818e6a3e141a02.3013.6.1718011439.log
I am very interested in reproducing this. Could you provide some more details about the "Reproduction Steps", especially which applications were used and how exactly the test loops look like?
@akhzarj can you give some indications on how we could reproduce it?
Hi @duartenfonseca , @lutzbichler
We already find out the root cause and it is related to dangling pointers in
implementation/endpoints/src/tcp_client_endpoint_impl.cpp
with the strand::dispatch() behavior dualism:
https://www.boost.org/doc/libs/1_80_0/doc/html/boost_asio/reference/strand/dispatch.html
When strand is busy then passed function will be scheduled and execute after return from dispatch() and it's caller function and then the passed references to local variables will become dangled. To be able to do reproduction the appropriate strands needs to be stressed to become busy.
The fix is removing references in:
- https://github.com/COVESA/vsomeip/blob/0b83e24d16e1611958194e9b727136522f46556b/implementation/endpoints/src/tcp_client_endpoint_impl.cpp#L272
- https://github.com/COVESA/vsomeip/blob/0b83e24d16e1611958194e9b727136522f46556b/implementation/endpoints/src/tcp_client_endpoint_impl.cpp#L773
- https://github.com/COVESA/vsomeip/blob/0b83e24d16e1611958194e9b727136522f46556b/implementation/endpoints/src/tcp_client_endpoint_impl.cpp#L801
- https://github.com/COVESA/vsomeip/blob/0b83e24d16e1611958194e9b727136522f46556b/implementation/endpoints/src/tcp_client_endpoint_impl.cpp#L951
Notes:
- In the last one the fix in addition replaces lambda with std::bind() due to lambda immutability, alternative make lambda mutable.
- It is not checked against the latest version of vsomeip, but you can easily if any new/updated strand::dispatch() contain the same issue.
Hi @akhzarj, So, If I remember correctly this is the same issue we discussed some time ago in the monthly meeting. The fix is not yet in the master, but I asked @kheaactua to create a PR with the fix. Can you have a look at #774, and update it. I seem it does not contain all changes. Thanks! :)
hmm, it looks like I am missing: https://github.com/COVESA/vsomeip/blob/0b83e24d16e1611958194e9b727136522f46556b/implementation/endpoints/src/tcp_client_endpoint_impl.cpp#L801
I'll add that now.
Hi @fcmonteiro Yes you are remembering it correctly and PR #774 contains the fix that we have with last update from @kheaactua .