hazelcast-cpp-client
hazelcast-cpp-client copied to clipboard
Test executable crashes when running ReliableTopicTest
The test executable occasionally crashes on GA while executing the tests under the suite ReliableTopicTest
.
Here is a run that crashed: https://github.com/hazelcast/hazelcast-cpp-client/runs/2559526252?check_suite_focus=true
Can you also attached the stack trace and log here in case the GA build log is lost?
Can you also attached the stack trace and log here in case the GA build log is lost?
This crash happens when the thread spawned for the continuation handler in ClientInvocation::invoke
outlive the client object and try to access the executor_
for scheduling the next continuation handler. Since the thread is created freely, it isn't managed by a thread pool or the client, and it can still be running after the client object is destroyed.
When I look at the code reliable_topic::on_shutdown
cancels the runner and we have the code that waits for executor threads to finish at this line during shutdown. Hence, I was expecting the active threads to finish gracefully before the client destructor destructs the objects. The general logic is to close all outstanding threads on client shutdown and then the client is destructed. Did you check if any outstanding such thread lives following the shutdown?
Hence, I was expecting the active threads to finish gracefully before the client destructor destructs the objects.
Yes, but the thread I mentioned is not managed by the client. Notice that the continuation handler is not bound to a specific executor, so a new and independent thread is created for it. And after executing its continuation handler (which can happen after the client object is long gone), it now wants to submit the job for the next handler, which is this. To do that, it needs to access the executor_
and it is destroyed. It doesn't matter whether you cancelled the message runner or joined all the other threads, as long as this is a free thread, it can outlive everything and try to access that destroyed executor for the next handler, all of which happen within the Boost future library.
Did you check if any outstanding such thread lives following the shutdown?
Yes, the thread is listed as Thread 1 (Thread 0x7fd926ffd700 (LWP 135427)):
in the above log. It's left over from the previous test ReliableTopicTest.testConfig
.
OK, I see, I looked at the wrong completion. Yes, I remember that I had to do this since a user thread was being stuck waiting an invocation response(future.get
) in one of the tests(it may be invocation_should_not_block_indefinitely_during_client_shutdown test) when client is shutdown and there is no other thread to notify the user thread. Normally, I expect the other completion to be effective but there was a problem that it was not working. I just tried the test with commenting out the lines that you mention it seems to pass on Mac OS, but it may be happening randomly or on linux, need to test further. I would like to remove those lines if possible, we just need to make sure that no such problem occurs as user getting stuck on future.get
.
Related to #852
Solved with PR #1071