connectedhomeip
connectedhomeip copied to clipboard
Shutdown while there are live exchanges will crash
Problem
During shutdown, we shut down the exchange manager, then the session manager.
Session manager shutdown evicts all sessions.
If there are exchanges waiting on a response timer, this will try to cancel the response timer, which will crash because we already shut down the exchange manager and that code is doing:
System::Layer * lSystemLayer = mExchangeMgr->GetSessionManager()->SystemLayer();
so we null-deref mExchangeMgr->GetSessionManager()
.
Stack:
2022-07-18T16:08:12.0943970Z 2022-07-18 16:08:12.092 INFO ThreadSanitizer:DEADLYSIGNAL
2022-07-18T16:08:12.1054250Z 2022-07-18 16:08:12.093 INFO [1m[31m==49711==ERROR: ThreadSanitizer: SEGV on unknown address 0x000000000018 (pc 0x00010e546d22 bp 0x7ffee20750e0 sp 0x7ffee20750c0 T352348)
2022-07-18T16:08:12.1141960Z 2022-07-18 16:08:12.093 INFO [1m[0m==49711==The signal is caused by a READ memory access.
2022-07-18T16:08:12.1226540Z 2022-07-18 16:08:12.093 INFO ==49711==Hint: address points to the zero page.
2022-07-18T16:08:17.2819920Z 2022-07-18 16:08:17.279 INFO #0 chip::SessionManager::SystemLayer() SessionManager.h:185 (chip-tool:x86_64+0x1009bed22)
2022-07-18T16:08:17.2821210Z 2022-07-18 16:08:17.279 INFO #1 chip::Messaging::ExchangeContext::CancelResponseTimer() ExchangeContext.cpp:428 (chip-tool:x86_64+0x100aaee32)
2022-07-18T16:08:17.2821910Z 2022-07-18 16:08:17.280 INFO #2 chip::Messaging::ExchangeContext::OnSessionReleased() ExchangeContext.cpp:392 (chip-tool:x86_64+0x100aafef5)
2022-07-18T16:08:17.2822650Z 2022-07-18 16:08:17.280 INFO #3 chip::SessionHolderWithDelegate::SessionReleased() SessionHolder.h:98 (chip-tool:x86_64+0x100b6a151)
2022-07-18T16:08:17.2823320Z 2022-07-18 16:08:17.280 INFO #4 chip::Transport::Session::NotifySessionReleased() Session.h:120 (chip-tool:x86_64+0x1000fd6c9)
2022-07-18T16:08:17.2838820Z 2022-07-18 16:08:17.280 INFO #5 chip::Transport::SecureSession::MarkForEviction() SecureSession.cpp:149 (chip-tool:x86_64+0x100abce9a)
2022-07-18T16:08:17.2840730Z 2022-07-18 16:08:17.280 INFO #6 auto chip::SessionManager::Shutdown()::$_0::operator()<chip::Transport::SecureSession*>(chip::Transport::SecureSession*) const SessionManager.cpp:119 (chip-tool:x86_64+0x100accfb1)
2022-07-18T16:08:17.2842710Z 2022-07-18 16:08:17.280 INFO #7 chip::internal::LambdaProxy<chip::Transport::SecureSession, chip::SessionManager::Shutdown()::$_0>::Call(void*, void*) Pool.h:126 (chip-tool:x86_64+0x100accf15)
2022-07-18T16:08:17.2843950Z 2022-07-18 16:08:17.280 INFO #8 chip::internal::HeapObjectList::ForEachNode(void*, chip::Loop (*)(void*, void*)) Pool.cpp:127 (chip-tool:x86_64+0x1009e1cca)
2022-07-18T16:08:17.2846490Z 2022-07-18 16:08:17.280 INFO #9 chip::Loop chip::HeapObjectPool<chip::Transport::SecureSession>::ForEachActiveObject<chip::SessionManager::Shutdown()::$_0>(chip::SessionManager::Shutdown()::$_0&&) Pool.h:396 (chip-tool:x86_64+0x100acce5d)
2022-07-18T16:08:17.2847970Z 2022-07-18 16:08:17.280 INFO #10 chip::Loop chip::Transport::SecureSessionTable::ForEachSession<chip::SessionManager::Shutdown()::$_0>(chip::SessionManager::Shutdown()::$_0&&) SecureSessionTable.h:84 (chip-tool:x86_64+0x100ac34b9)
2022-07-18T16:08:17.2849040Z 2022-07-18 16:08:17.280 INFO #11 chip::SessionManager::Shutdown() SessionManager.cpp:118 (chip-tool:x86_64+0x100ac2e1e)
2022-07-18T16:08:17.2850200Z 2022-07-18 16:08:17.280 INFO #12 chip::Controller::DeviceControllerSystemState::Shutdown() CHIPDeviceControllerFactory.cpp:439 (chip-tool:x86_64+0x100b9438f)
2022-07-18T16:08:17.2851710Z 2022-07-18 16:08:17.280 INFO #13 chip::Controller::DeviceControllerSystemState::Release() CHIPDeviceControllerSystemState.h:160 (chip-tool:x86_64+0x100b4f160)
2022-07-18T16:08:17.2853550Z 2022-07-18 16:08:17.281 INFO #14 chip::Controller::DeviceController::Shutdown() CHIPDeviceController.cpp:315 (chip-tool:x86_64+0x100b4f072)
2022-07-18T16:08:17.2854270Z 2022-07-18 16:08:17.281 INFO
2022-07-18T16:08:17.2855120Z 2022-07-18 16:08:17.281 INFO #15 chip::Controller::DeviceCommissioner::Shutdown() CHIPDeviceController.cpp:497 (chip-tool:x86_64+0x100b51432)
2022-07-18T16:08:17.2856420Z 2022-07-18 16:08:17.281 INFO #16 CHIPCommand::ShutdownCommissioner(std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char> >) CHIPCommand.cpp:318 (chip-tool:x86_64+0x1008facbe)
2022-07-18T16:08:17.2857390Z 2022-07-18 16:08:17.281 INFO #17 CHIPCommand::MaybeTearDownStack() CHIPCommand.cpp:170 (chip-tool:x86_64+0x1008fac37)
2022-07-18T16:08:17.2858450Z 2022-07-18 16:08:17.281 INFO #18 CHIPCommand::Run() CHIPCommand.cpp:195 (chip-tool:x86_64+0x1008fae53)
2022-07-18T16:08:17.2859330Z 2022-07-18 16:08:17.281 INFO #19 Commands::RunCommand(int, char**, bool) Commands.cpp:147 (chip-tool:x86_64+0x1009251bf)
2022-07-18T16:08:17.2860240Z 2022-07-18 16:08:17.281 INFO #20 Commands::Run(int, char**) Commands.cpp:51 (chip-tool:x86_64+0x100924335)
2022-07-18T16:08:17.2860970Z 2022-07-18 16:08:17.281 INFO #21 main main.cpp:48 (chip-tool:x86_64+0x1000e5a8d)
2022-07-18T16:08:17.2861650Z 2022-07-18 16:08:17.281 INFO #22 start <null>:2 (libdyld.dylib:x86_64+0x15f3c)
Proposed Solution
Shutdown of the exchange manager should probably shut down outstanding exchanges. Or we should evict all sessions much earlier in shutdown, before we start tearing down object relationships.
Or both: since shutting down exchanges notifies applications, we should be doing that as early during shutdown as we can, so that there is minimal risk that applications touch something that is already torn down.
Easy way to reproduce: set kSessionEstablishmentTimeout
in CHIPDeviceController.cpp
to 1, and then try to use chip-tool pairing code
.
Shutting down exchanges has nuances - if the exchange is awaiting a response, then the exchange management layer presumably owns the ref, and should call Abort
on it. If it has the will send flag on it, then we can only do a soft close by calling DoClose
and notify the app. (logic here should be very similar to the logic in OnSessionReleased...)
Hi, I had the same problem while doing a thread-ble pairing (succeeded, OnPairingComplete was called) .
After debugging as shown in the stack below, a call to the "SystemLayer" function is made after a call to "GetSessionManager()" which returns a null pointer.
I am on 1a25bbd828c63fb2431c536eda5405ad374882ee / SVE - Ubuntu 20.04 64 bits
Callstack:
chip::SessionManager::SystemLayer (this=0x0) <- Oups!
chip::Messaging::ExchangeContext::CancelResponseTimer (this=0x7fffd001ca00)
chip::Messaging::ExchangeContext::OnSessionReleased (this=0x7fffd001ca00)
chip::Messaging::ExchangeContext::OnSessionReleased (this=0x7fffd001ca00)
chip::Transport::Session::NotifySessionReleased (this=0x7fffd0001a30, this@entry=0x7fffd0001a30)
chip::Transport::SecureSession::MarkForEviction (this=0x7fffd0001a30)
operator()chip::Transport::SecureSession* (session=
Or we should evict all sessions much earlier in shutdown
On the client, if you use DeviceController we now do that during DeviceController shutdown.... But on the server, or if not using DeviceController, you're on your own.