rerun icon indicating copy to clipboard operation
rerun copied to clipboard

`debug_assert` triggers when closing partially loaded recording

Open lucasmerlin opened this issue 7 months ago • 6 comments

Describe the bug Opening the air traffic example on main and quickly closing it (while it's still loading) triggers a debug_assert:

To Reproduce

  1. open air traffic example
  2. quickly close it
  3. should crash

Expected behavior No crash

Backtrace

thread 'main' panicked at 'active recordings should always have associated caches'
re_viewer_context/src/store_hub.rs:525
stack backtrace:
   8: core::panicking::panic_fmt
   9: re_viewer::app::App::receive_messages
  10: <re_viewer::app::App as eframe::epi::App>::update
  11: egui::context::Context::run
  12: eframe::native::epi_integration::EpiIntegration::update
[...]

lucasmerlin avatar Apr 30 '25 12:04 lucasmerlin

Possibly easier repro:

  1. open two recordings from a remote server (I couldn't trigger the crash with a single recording open)
  2. go back to the server screen (I couldn't trigger the crash when remaining on the recording view)
  3. close one of the recording from the recording panel
  4. :boom:
thread 'main' panicked at 'active recordings should always have associated caches'
re_viewer_context/src/store_hub.rs:485
stack backtrace:
   8: core::panicking::panic_fmt
             at core/src/panicking.rs:75:14
   9: re_viewer_context::store_hub::StoreHub::active_caches::{{closure}}
             at re_viewer_context/src/store_hub.rs:485:13
      core::option::Option<T>::and_then
             at core/src/option.rs:1452:24
      re_viewer_context::store_hub::StoreHub::active_caches
             at re_viewer_context/src/store_hub.rs:482:9
      re_viewer_context::store_hub::StoreHub::read_context
             at re_viewer_context/src/store_hub.rs:256:26
  10: <re_viewer::app::App as eframe::epi::App>::update
             at re_viewer/src/app.rs:2339:52
  11: egui::context::Context::run
  12: eframe::native::epi_integration::EpiIntegration::update
  13: <eframe::native::wgpu_integration::WgpuWinitApp as eframe::native::winit_integration::WinitApp>::run_ui_and_paint
  14: <eframe::native::run::WinitAppWrapper<T> as winit::application::ApplicationHandler<eframe::native::winit_integration::UserEvent>>::window_event
  15: winit::event_loop::EventLoop<T>::run_app::{{closure}}
  16: winit::platform_impl::macos::event_loop::map_user_event::{{closure}}
  17: winit::platform_impl::macos::event_handler::EventHandler::handle_event
  18: winit::platform_impl::macos::app_state::ApplicationDelegate::handle_event
  19: winit::platform_impl::macos::app_state::ApplicationDelegate::cleared
  20: winit::platform_impl::macos::observer::control_flow_end_handler
  21: <unknown>
  22: <unknown>
  23: <unknown>
  24: <unknown>
  25: <unknown>
  26: <unknown>
  27: <unknown>
  28: <unknown>
  29: <unknown>
  30: <unknown>
  31: winit::platform_impl::macos::app_state::ApplicationDelegate::set_event_handler
  32: eframe::run_native

abey79 avatar May 23 '25 12:05 abey79

I stumbled upon this issue as well yesterday. I had a quick look into it.

In short

The app panics because it still receives messages, even though the app (air traffic example) has closed already. This does not happen when all the data has loaded (in the air traffic example).

This is reproducible in the ARKit example as well (slowed down a bit by a debugging session, allowing one to click close while messages are still being processed).

More details (in chronological order)

App closing

The user closes the app while it is still loading:

App.run_system_command() is called:

and this statement is run:

fn run_system_command() {
...
            SystemCommand::CloseApp(app_id) => {
                store_hub.close_app(&app_id);
            }
...

}

This calls StoreHub.close_app()

        self.caches_per_recording
            .retain(|store_id, _| !store_ids_removed.contains(store_id));

Messages being received

In the next cycle, messages are still arriving. They are processed in App.receive_messages().

impl App {

    fn receive_messages(&self, store_hub: &mut StoreHub, egui_ctx: &egui::Context) {

....
            match store_hub.entity_db_mut(store_id).add(&msg) {
                Ok(store_events) => {
                    if let Some(caches) = store_hub.active_caches() {
                        caches.on_store_events(&store_events);
                    }
                    self.validate_loaded_events(&store_events);
                }
....
}

But, by then, the caches have already been purged, hence the panic on

impl StoreHub {

     pub fn active_caches(&self) -> Option<&Caches> {
....
            debug_assert!(
                caches.is_some(),
                "active recordings should always have associated caches",
            );
....
}

But since the cache has been purged earlier for this app, caches is None and the assert fails.

Discussion

I'm still getting up to speed on the app's architecture and messaging protocol, so a solution is yet unclear.

Discarding all future messages for that app would be an option, but it looks like the UUID is the same across run instances, so it's unclear how to keep differentiating between multiple runs of the same app.

jccampagne avatar Jun 19 '25 08:06 jccampagne

thanks for looking at the flow of this! Sounds like ideally we'd discard those store events somehow until a new recording with that application id comes up. I suppose we could just keep a list of closed app ids around. Sure that in theory leaks a string over time, i.e. one could have thousands of opened and closed app ids that accumulate but realistically even that isn't much of an issue. Question is more of how to do that in a non-brittle way. All the App and StoreHub code isn't in a great shape imho :/

Wumpf avatar Jun 19 '25 08:06 Wumpf

Related linear issue: https://linear.app/rerun/issue/OSS-1238/debug-assert-triggers-when-closing-partially-loaded-recording

nikolausWest avatar Jun 19 '25 11:06 nikolausWest

I hit this panic just now when opening the "ARKit scenes" and quickly closing the recording. And am able to reproduce. Stack trace is a bit different though:

thread 'main' panicked at 'active recordings should always have associated caches'
re_viewer_context/src/store_hub.rs:524
stack backtrace:
   9: core::panicking::panic_fmt
             at core/src/panicking.rs:75:14
  10: re_viewer_context::store_hub::StoreHub::active_caches
             at re_viewer_context/src/store_hub.rs:524:9
      re_viewer::app::App::receive_messages
             at re_viewer/src/app.rs:1945:43
  11: <re_viewer::app::App as eframe::epi::App>::update
             at re_viewer/src/app.rs:2682:9
  12: egui::context::Context::run
  13: eframe::native::epi_integration::EpiIntegration::update

IsseW avatar Sep 05 '25 11:09 IsseW

I hit this, and have a reliable way to reproduce it:

  • pixi run rerun server -d tests/assets/rrd/dataset
  • pixi run rerun connect localhost:51234
  • Click "dataset" / "Open" on first dataset
  • Click "dataset" / "Open" on second dataset
  • Click "dataset" and then the close icon in the list on the second dataset.

The viewer panics with the following trace:

thread 'main' panicked at 'active recordings should always have associated caches'
re_viewer_context/src/store_hub.rs:529
stack backtrace:
   9: core::panicking::panic_fmt
             at core/src/panicking.rs:75:14
  10: re_viewer_context::store_hub::StoreHub::active_caches
             at re_viewer_context/src/store_hub.rs:529:9
      re_viewer_context::store_hub::StoreHub::read_context
             at re_viewer_context/src/store_hub.rs:267:26
  11: <re_viewer::app::App as eframe::epi::App>::update
             at re_viewer/src/app.rs:3060:52
  12: egui::context::Context::run
  13: eframe::native::epi_integration::EpiIntegration::update
  14: <eframe::native::wgpu_integration::WgpuWinitApp as eframe::native::winit_integration::WinitApp>::run_ui_and_paint
  15: eframe::native::event_loop_context::with_event_loop_context
  16: <eframe::native::run::WinitAppWrapper<T> as winit::application::ApplicationHandler<eframe::native::winit_integration::UserEvent>>::window_event
  17: winit::event_loop::EventLoop<T>::run_app::{{closure}}
  18: winit::platform_impl::macos::event_loop::map_user_event::{{closure}}
  19: winit::platform_impl::macos::event_handler::EventHandler::handle_event
  20: winit::platform_impl::macos::app_state::ApplicationDelegate::handle_event
  21: winit::platform_impl::macos::app_state::ApplicationDelegate::cleared
  22: winit::platform_impl::macos::observer::control_flow_end_handler
  23: <unknown>
  24: <unknown>
  25: <unknown>
  26: <unknown>
  27: <unknown>
  28: <unknown>
  29: <unknown>
  30: <unknown>
  31: <unknown>
  32: <unknown>
  33: <unknown>
  34: <unknown>
  35: winit::platform_impl::macos::event_handler::EventHandler::set
  36: winit::platform_impl::macos::event_loop::EventLoop<T>::run_on_demand
  37: eframe::native::run::run_and_return
  38: std::thread::local::LocalKey<T>::with
  39: eframe::run_native

swallez avatar Oct 29 '25 12:10 swallez