`debug_assert` triggers when closing partially loaded recording
Describe the bug Opening the air traffic example on main and quickly closing it (while it's still loading) triggers a debug_assert:
To Reproduce
- open air traffic example
- quickly close it
- should crash
Expected behavior No crash
Backtrace
thread 'main' panicked at 'active recordings should always have associated caches'
re_viewer_context/src/store_hub.rs:525
stack backtrace:
8: core::panicking::panic_fmt
9: re_viewer::app::App::receive_messages
10: <re_viewer::app::App as eframe::epi::App>::update
11: egui::context::Context::run
12: eframe::native::epi_integration::EpiIntegration::update
[...]
Possibly easier repro:
- open two recordings from a remote server (I couldn't trigger the crash with a single recording open)
- go back to the server screen (I couldn't trigger the crash when remaining on the recording view)
- close one of the recording from the recording panel
- :boom:
thread 'main' panicked at 'active recordings should always have associated caches'
re_viewer_context/src/store_hub.rs:485
stack backtrace:
8: core::panicking::panic_fmt
at core/src/panicking.rs:75:14
9: re_viewer_context::store_hub::StoreHub::active_caches::{{closure}}
at re_viewer_context/src/store_hub.rs:485:13
core::option::Option<T>::and_then
at core/src/option.rs:1452:24
re_viewer_context::store_hub::StoreHub::active_caches
at re_viewer_context/src/store_hub.rs:482:9
re_viewer_context::store_hub::StoreHub::read_context
at re_viewer_context/src/store_hub.rs:256:26
10: <re_viewer::app::App as eframe::epi::App>::update
at re_viewer/src/app.rs:2339:52
11: egui::context::Context::run
12: eframe::native::epi_integration::EpiIntegration::update
13: <eframe::native::wgpu_integration::WgpuWinitApp as eframe::native::winit_integration::WinitApp>::run_ui_and_paint
14: <eframe::native::run::WinitAppWrapper<T> as winit::application::ApplicationHandler<eframe::native::winit_integration::UserEvent>>::window_event
15: winit::event_loop::EventLoop<T>::run_app::{{closure}}
16: winit::platform_impl::macos::event_loop::map_user_event::{{closure}}
17: winit::platform_impl::macos::event_handler::EventHandler::handle_event
18: winit::platform_impl::macos::app_state::ApplicationDelegate::handle_event
19: winit::platform_impl::macos::app_state::ApplicationDelegate::cleared
20: winit::platform_impl::macos::observer::control_flow_end_handler
21: <unknown>
22: <unknown>
23: <unknown>
24: <unknown>
25: <unknown>
26: <unknown>
27: <unknown>
28: <unknown>
29: <unknown>
30: <unknown>
31: winit::platform_impl::macos::app_state::ApplicationDelegate::set_event_handler
32: eframe::run_native
I stumbled upon this issue as well yesterday. I had a quick look into it.
In short
The app panics because it still receives messages, even though the app (air traffic example) has closed already. This does not happen when all the data has loaded (in the air traffic example).
This is reproducible in the ARKit example as well (slowed down a bit by a debugging session, allowing one to click close while messages are still being processed).
More details (in chronological order)
App closing
The user closes the app while it is still loading:
App.run_system_command() is called:
and this statement is run:
fn run_system_command() {
...
SystemCommand::CloseApp(app_id) => {
store_hub.close_app(&app_id);
}
...
}
This calls StoreHub.close_app()
self.caches_per_recording
.retain(|store_id, _| !store_ids_removed.contains(store_id));
Messages being received
In the next cycle, messages are still arriving. They are processed in App.receive_messages().
impl App {
fn receive_messages(&self, store_hub: &mut StoreHub, egui_ctx: &egui::Context) {
....
match store_hub.entity_db_mut(store_id).add(&msg) {
Ok(store_events) => {
if let Some(caches) = store_hub.active_caches() {
caches.on_store_events(&store_events);
}
self.validate_loaded_events(&store_events);
}
....
}
But, by then, the caches have already been purged, hence the panic on
impl StoreHub {
pub fn active_caches(&self) -> Option<&Caches> {
....
debug_assert!(
caches.is_some(),
"active recordings should always have associated caches",
);
....
}
But since the cache has been purged earlier for this app, caches is None and the assert fails.
Discussion
I'm still getting up to speed on the app's architecture and messaging protocol, so a solution is yet unclear.
Discarding all future messages for that app would be an option, but it looks like the UUID is the same across run instances, so it's unclear how to keep differentiating between multiple runs of the same app.
thanks for looking at the flow of this! Sounds like ideally we'd discard those store events somehow until a new recording with that application id comes up.
I suppose we could just keep a list of closed app ids around. Sure that in theory leaks a string over time, i.e. one could have thousands of opened and closed app ids that accumulate but realistically even that isn't much of an issue.
Question is more of how to do that in a non-brittle way. All the App and StoreHub code isn't in a great shape imho :/
Related linear issue: https://linear.app/rerun/issue/OSS-1238/debug-assert-triggers-when-closing-partially-loaded-recording
I hit this panic just now when opening the "ARKit scenes" and quickly closing the recording. And am able to reproduce. Stack trace is a bit different though:
thread 'main' panicked at 'active recordings should always have associated caches'
re_viewer_context/src/store_hub.rs:524
stack backtrace:
9: core::panicking::panic_fmt
at core/src/panicking.rs:75:14
10: re_viewer_context::store_hub::StoreHub::active_caches
at re_viewer_context/src/store_hub.rs:524:9
re_viewer::app::App::receive_messages
at re_viewer/src/app.rs:1945:43
11: <re_viewer::app::App as eframe::epi::App>::update
at re_viewer/src/app.rs:2682:9
12: egui::context::Context::run
13: eframe::native::epi_integration::EpiIntegration::update
I hit this, and have a reliable way to reproduce it:
pixi run rerun server -d tests/assets/rrd/datasetpixi run rerun connect localhost:51234- Click "dataset" / "Open" on first dataset
- Click "dataset" / "Open" on second dataset
- Click "dataset" and then the close icon in the list on the second dataset.
The viewer panics with the following trace:
thread 'main' panicked at 'active recordings should always have associated caches'
re_viewer_context/src/store_hub.rs:529
stack backtrace:
9: core::panicking::panic_fmt
at core/src/panicking.rs:75:14
10: re_viewer_context::store_hub::StoreHub::active_caches
at re_viewer_context/src/store_hub.rs:529:9
re_viewer_context::store_hub::StoreHub::read_context
at re_viewer_context/src/store_hub.rs:267:26
11: <re_viewer::app::App as eframe::epi::App>::update
at re_viewer/src/app.rs:3060:52
12: egui::context::Context::run
13: eframe::native::epi_integration::EpiIntegration::update
14: <eframe::native::wgpu_integration::WgpuWinitApp as eframe::native::winit_integration::WinitApp>::run_ui_and_paint
15: eframe::native::event_loop_context::with_event_loop_context
16: <eframe::native::run::WinitAppWrapper<T> as winit::application::ApplicationHandler<eframe::native::winit_integration::UserEvent>>::window_event
17: winit::event_loop::EventLoop<T>::run_app::{{closure}}
18: winit::platform_impl::macos::event_loop::map_user_event::{{closure}}
19: winit::platform_impl::macos::event_handler::EventHandler::handle_event
20: winit::platform_impl::macos::app_state::ApplicationDelegate::handle_event
21: winit::platform_impl::macos::app_state::ApplicationDelegate::cleared
22: winit::platform_impl::macos::observer::control_flow_end_handler
23: <unknown>
24: <unknown>
25: <unknown>
26: <unknown>
27: <unknown>
28: <unknown>
29: <unknown>
30: <unknown>
31: <unknown>
32: <unknown>
33: <unknown>
34: <unknown>
35: winit::platform_impl::macos::event_handler::EventHandler::set
36: winit::platform_impl::macos::event_loop::EventLoop<T>::run_on_demand
37: eframe::native::run::run_and_return
38: std::thread::local::LocalKey<T>::with
39: eframe::run_native