connectedhomeip icon indicating copy to clipboard operation
connectedhomeip copied to clipboard

Shutdown while there are live exchanges will crash

Open bzbarsky-apple opened this issue 2 years ago • 4 comments

Problem

During shutdown, we shut down the exchange manager, then the session manager.

Session manager shutdown evicts all sessions.

If there are exchanges waiting on a response timer, this will try to cancel the response timer, which will crash because we already shut down the exchange manager and that code is doing:

    System::Layer * lSystemLayer = mExchangeMgr->GetSessionManager()->SystemLayer();

so we null-deref mExchangeMgr->GetSessionManager().

Stack:
2022-07-18T16:08:12.0943970Z 2022-07-18 16:08:12.092 INFO    ThreadSanitizer:DEADLYSIGNAL
2022-07-18T16:08:12.1054250Z 2022-07-18 16:08:12.093 INFO    [1m[31m==49711==ERROR: ThreadSanitizer: SEGV on unknown address 0x000000000018 (pc 0x00010e546d22 bp 0x7ffee20750e0 sp 0x7ffee20750c0 T352348)
2022-07-18T16:08:12.1141960Z 2022-07-18 16:08:12.093 INFO    [1m[0m==49711==The signal is caused by a READ memory access.
2022-07-18T16:08:12.1226540Z 2022-07-18 16:08:12.093 INFO    ==49711==Hint: address points to the zero page.
2022-07-18T16:08:17.2819920Z 2022-07-18 16:08:17.279 INFO        #0 chip::SessionManager::SystemLayer() SessionManager.h:185 (chip-tool:x86_64+0x1009bed22)
2022-07-18T16:08:17.2821210Z 2022-07-18 16:08:17.279 INFO        #1 chip::Messaging::ExchangeContext::CancelResponseTimer() ExchangeContext.cpp:428 (chip-tool:x86_64+0x100aaee32)
2022-07-18T16:08:17.2821910Z 2022-07-18 16:08:17.280 INFO        #2 chip::Messaging::ExchangeContext::OnSessionReleased() ExchangeContext.cpp:392 (chip-tool:x86_64+0x100aafef5)
2022-07-18T16:08:17.2822650Z 2022-07-18 16:08:17.280 INFO        #3 chip::SessionHolderWithDelegate::SessionReleased() SessionHolder.h:98 (chip-tool:x86_64+0x100b6a151)
2022-07-18T16:08:17.2823320Z 2022-07-18 16:08:17.280 INFO        #4 chip::Transport::Session::NotifySessionReleased() Session.h:120 (chip-tool:x86_64+0x1000fd6c9)
2022-07-18T16:08:17.2838820Z 2022-07-18 16:08:17.280 INFO        #5 chip::Transport::SecureSession::MarkForEviction() SecureSession.cpp:149 (chip-tool:x86_64+0x100abce9a)
2022-07-18T16:08:17.2840730Z 2022-07-18 16:08:17.280 INFO        #6 auto chip::SessionManager::Shutdown()::$_0::operator()<chip::Transport::SecureSession*>(chip::Transport::SecureSession*) const SessionManager.cpp:119 (chip-tool:x86_64+0x100accfb1)
2022-07-18T16:08:17.2842710Z 2022-07-18 16:08:17.280 INFO        #7 chip::internal::LambdaProxy<chip::Transport::SecureSession, chip::SessionManager::Shutdown()::$_0>::Call(void*, void*) Pool.h:126 (chip-tool:x86_64+0x100accf15)
2022-07-18T16:08:17.2843950Z 2022-07-18 16:08:17.280 INFO        #8 chip::internal::HeapObjectList::ForEachNode(void*, chip::Loop (*)(void*, void*)) Pool.cpp:127 (chip-tool:x86_64+0x1009e1cca)
2022-07-18T16:08:17.2846490Z 2022-07-18 16:08:17.280 INFO        #9 chip::Loop chip::HeapObjectPool<chip::Transport::SecureSession>::ForEachActiveObject<chip::SessionManager::Shutdown()::$_0>(chip::SessionManager::Shutdown()::$_0&&) Pool.h:396 (chip-tool:x86_64+0x100acce5d)
2022-07-18T16:08:17.2847970Z 2022-07-18 16:08:17.280 INFO        #10 chip::Loop chip::Transport::SecureSessionTable::ForEachSession<chip::SessionManager::Shutdown()::$_0>(chip::SessionManager::Shutdown()::$_0&&) SecureSessionTable.h:84 (chip-tool:x86_64+0x100ac34b9)
2022-07-18T16:08:17.2849040Z 2022-07-18 16:08:17.280 INFO        #11 chip::SessionManager::Shutdown() SessionManager.cpp:118 (chip-tool:x86_64+0x100ac2e1e)
2022-07-18T16:08:17.2850200Z 2022-07-18 16:08:17.280 INFO        #12 chip::Controller::DeviceControllerSystemState::Shutdown() CHIPDeviceControllerFactory.cpp:439 (chip-tool:x86_64+0x100b9438f)
2022-07-18T16:08:17.2851710Z 2022-07-18 16:08:17.280 INFO        #13 chip::Controller::DeviceControllerSystemState::Release() CHIPDeviceControllerSystemState.h:160 (chip-tool:x86_64+0x100b4f160)
2022-07-18T16:08:17.2853550Z 2022-07-18 16:08:17.281 INFO        #14 chip::Controller::DeviceController::Shutdown() CHIPDeviceController.cpp:315 (chip-tool:x86_64+0x100b4f072)
2022-07-18T16:08:17.2854270Z 2022-07-18 16:08:17.281 INFO    
2022-07-18T16:08:17.2855120Z 2022-07-18 16:08:17.281 INFO        #15 chip::Controller::DeviceCommissioner::Shutdown() CHIPDeviceController.cpp:497 (chip-tool:x86_64+0x100b51432)
2022-07-18T16:08:17.2856420Z 2022-07-18 16:08:17.281 INFO        #16 CHIPCommand::ShutdownCommissioner(std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char> >) CHIPCommand.cpp:318 (chip-tool:x86_64+0x1008facbe)
2022-07-18T16:08:17.2857390Z 2022-07-18 16:08:17.281 INFO        #17 CHIPCommand::MaybeTearDownStack() CHIPCommand.cpp:170 (chip-tool:x86_64+0x1008fac37)
2022-07-18T16:08:17.2858450Z 2022-07-18 16:08:17.281 INFO        #18 CHIPCommand::Run() CHIPCommand.cpp:195 (chip-tool:x86_64+0x1008fae53)
2022-07-18T16:08:17.2859330Z 2022-07-18 16:08:17.281 INFO        #19 Commands::RunCommand(int, char**, bool) Commands.cpp:147 (chip-tool:x86_64+0x1009251bf)
2022-07-18T16:08:17.2860240Z 2022-07-18 16:08:17.281 INFO        #20 Commands::Run(int, char**) Commands.cpp:51 (chip-tool:x86_64+0x100924335)
2022-07-18T16:08:17.2860970Z 2022-07-18 16:08:17.281 INFO        #21 main main.cpp:48 (chip-tool:x86_64+0x1000e5a8d)
2022-07-18T16:08:17.2861650Z 2022-07-18 16:08:17.281 INFO        #22 start <null>:2 (libdyld.dylib:x86_64+0x15f3c)

Proposed Solution

Shutdown of the exchange manager should probably shut down outstanding exchanges. Or we should evict all sessions much earlier in shutdown, before we start tearing down object relationships.

Or both: since shutting down exchanges notifies applications, we should be doing that as early during shutdown as we can, so that there is minimal risk that applications touch something that is already torn down.

bzbarsky-apple avatar Jul 18 '22 18:07 bzbarsky-apple

Easy way to reproduce: set kSessionEstablishmentTimeout in CHIPDeviceController.cpp to 1, and then try to use chip-tool pairing code.

bzbarsky-apple avatar Jul 18 '22 18:07 bzbarsky-apple

Shutting down exchanges has nuances - if the exchange is awaiting a response, then the exchange management layer presumably owns the ref, and should call Abort on it. If it has the will send flag on it, then we can only do a soft close by calling DoClose and notify the app. (logic here should be very similar to the logic in OnSessionReleased...)

mrjerryjohns avatar Aug 11 '22 17:08 mrjerryjohns

Hi, I had the same problem while doing a thread-ble pairing (succeeded, OnPairingComplete was called) .

After debugging as shown in the stack below, a call to the "SystemLayer" function is made after a call to "GetSessionManager()" which returns a null pointer.

I am on 1a25bbd828c63fb2431c536eda5405ad374882ee / SVE - Ubuntu 20.04 64 bits

Callstack: chip::SessionManager::SystemLayer (this=0x0) <- Oups! chip::Messaging::ExchangeContext::CancelResponseTimer (this=0x7fffd001ca00) chip::Messaging::ExchangeContext::OnSessionReleased (this=0x7fffd001ca00) chip::Messaging::ExchangeContext::OnSessionReleased (this=0x7fffd001ca00) chip::Transport::Session::NotifySessionReleased (this=0x7fffd0001a30, this@entry=0x7fffd0001a30) chip::Transport::SecureSession::MarkForEviction (this=0x7fffd0001a30) operator()chip::Transport::SecureSession* (session=, __closure=) chip::internal::LambdaProxy<chip::Transport::SecureSession, chip::SessionManager::Shutdown()::<lambda(auto:5)> >::Call(void *, void *) (context=, target=) chip::internal::HeapObjectList::ForEachNode (this=0x555555632778, this@entry=0x555555632778, context=0x7fffffffdc77, context@entry=0x7fffffffdc77, lambda=0x7ffff7dfbfa7 <chip::internal::LambdaProxy<chip::Transport::SecureSession, chip::SessionManager::Shutdown()::<lambda(auto:5)> >::Call(void *, void *)>, lambda@entry=0x7ffff7dfbfa7 <chip::internal::LambdaProxy<chip::Transport::SecureSession, chip::SessionManager::Shutdown()::<lambda(auto:5)> >::Call(void *, void *)>) chip::HeapObjectPoolchip::Transport::SecureSession::ForEachActiveObject<chip::SessionManager::Shutdown()::<lambda(auto:5)> > (this=0x555555632768) chip::Transport::SecureSessionTable::ForEachSession<chip::SessionManager::Shutdown()::<lambda(auto:5)> > (this=0x555555632760) chip::SessionManager::Shutdown (this=0x555555632700) chip::Controller::DeviceControllerSystemState::Shutdown (this=0x5555556378d0) chip::Controller::DeviceControllerSystemState::Shutdown (this=0x5555556378d0) chip::Controller::DeviceControllerSystemState::Release (this=) chip::Controller::DeviceController::Shutdown (this=0x55555565a530) chip::Controller::DeviceController::Shutdown (this=0x55555565a530) CHIPCommand::MaybeTearDownStack (this=0x7fffffffddf0) CHIPCommand::MaybeTearDownStack (this=0x7fffffffddf0) CHIPCommand::Run (this=0x7fffffffddf0) CHIPCommand::Run (this=0x7fffffffddf0, this@entry=0x7fffffffddf0)

achsoftathome avatar Aug 17 '22 13:08 achsoftathome

Or we should evict all sessions much earlier in shutdown

On the client, if you use DeviceController we now do that during DeviceController shutdown.... But on the server, or if not using DeviceController, you're on your own.

bzbarsky-apple avatar Sep 10 '22 04:09 bzbarsky-apple