rerun Project: Performant visualization of scenes with large number of entities

Context

We want the viewer to be able to scale to scenes with large numbers of entities. This of course means visualizing these scenes, but also ingesting them in the first place.

This is blocked on a number of specific implementation issues, but put broadly: the work the viewer has to do to layout a scene more often than not grows linearly with the number of entities present in the entire dataset.

There are only two ways to combat this:

Break the linear growth where possible (e.g. only compute the data for the visible part of a list as opposed to the entire list).
(Incremental) caching where possible (e.g. incrementally cache and update the transform hierarchy).

Of course in many cases, option 1 isn't even an option: if the user wants to visualize all entities in the scene, then somehow we have to make that fast.

Incremental caching of aggregated data (which is what the visualizers work with) is very hard, but will be a must in order to reach our performance goals.

This issue is not about:

Making the UX work for large number of entities (search, etc).

Measurable end goals

Air Traffic example (`2h` dataset)

Ingestion directly from the SDK should be bottlenecked by the SDK, always.
Ingestion from file should be order of magnitude faster, in the single digit seconds realm.
Should be able to visualize, without visible time range, at or close to 60FPS on any decent machine, on the web.
Should be able to visualize, with infinite visible time range, to run at 60FPS on high-end machines, on native.

TODO(cmc): What should we do about plotting? Is plotting 10k lines on a single plot really an important use case? If so, do we need to bring egui issues into this?

Revy

Revy was infamously bottlenecked by the performance of many entities (game scenes have a lot of them). This is a good opportunity to revive that project, if we can make it happen.

Should be able to ingest and visualize Bevy's "Alien Cake Addict" example in real-time, on a decent machine, on the web.

Relevant material

Writings:

https://github.com/rerun-io/rerun/issues/5974
https://github.com/rerun-io/rerun/issues/8221
TODO(cmc): link to updated range semantics proposal
TODO(cmc): link to range-zipped caching proposal

PRs:

https://github.com/rerun-io/rerun/pull/5449
https://github.com/rerun-io/rerun/pull/8224

Areas that need significant improvements

[ ] https://github.com/rerun-io/rerun/issues/8340 determining which entities participate in each view (query_results)
[ ] determining whether new space views should be added (spawn_heuristic_space_views)
[ ] DataQueryPropertyResolver (blueprint resolve)
[ ] Time Panel tree_ui
[ ] Visualizer execution
- [ ] transforms, see https://github.com/rerun-io/rerun/issues/7025
- [ ] chunk processing, see https://github.com/rerun-io/rerun/issues/8221
  - this is in large part a per-visualizer optimization / change in how we set up visualizers
  - [ ] [optional?] retained GPU data (at least for point clouds)
- [ ] annotation context loading
[ ] Egui tesselation for time series / amount of primitives emitted to egui
[ ] Ingestion

Wherever we don't do something obviously silly, we should strive to go with a retained/cached approach in order to get more scalable and robust against the per-frame regressions in trivially looking (== static frame) scenarios If this is structurally hard, revisit structure!

Nov 27 '24 16:11 teh-cmc

What does performance look like when we only do the querying + chunk processing part? I.e. let's pretend we've managed to optimized out everything that visualizers do after receiving the chunks (range-zipping, color/radii/etc splatting, annotation context...).

We need to know this because our upcoming aggregated caching plan still involves running the queries every frame.

I've applied the following patch, which (very broadly) simulates that:

Click for diff

diff --git a/crates/viewer/re_space_view/src/results_ext.rs b/crates/viewer/re_space_view/src/results_ext.rs
index 55e52be3e00..01d5cae4e17 100644
--- a/crates/viewer/re_space_view/src/results_ext.rs
+++ b/crates/viewer/re_space_view/src/results_ext.rs
@@ -427,7 +427,7 @@ impl<'a> HybridResultsChunkIter<'a> {
     pub fn component<C: re_types_core::Component>(
         &'a self,
     ) -> impl Iterator<Item = ((TimeInt, RowId), ChunkComponentIterItem<C>)> + 'a {
-        self.chunks.iter().flat_map(move |chunk| {
+        self.chunks.iter().filter(|_| false).flat_map(move |chunk| {
             itertools::izip!(
                 chunk.iter_component_indices(&self.timeline, &self.component_name),
                 chunk.iter_component::<C>(),
@@ -441,7 +441,7 @@ impl<'a> HybridResultsChunkIter<'a> {
     pub fn primitive<T: arrow2::types::NativeType>(
         &'a self,
     ) -> impl Iterator<Item = ((TimeInt, RowId), &'a [T])> + 'a {
-        self.chunks.iter().flat_map(move |chunk| {
+        self.chunks.iter().filter(|_| false).flat_map(move |chunk| {
             itertools::izip!(
                 chunk.iter_component_indices(&self.timeline, &self.component_name),
                 chunk.iter_primitive::<T>(&self.component_name)
@@ -458,7 +458,7 @@ impl<'a> HybridResultsChunkIter<'a> {
     where
         [T; N]: bytemuck::Pod,
     {
-        self.chunks.iter().flat_map(move |chunk| {
+        self.chunks.iter().filter(|_| false).flat_map(move |chunk| {
             itertools::izip!(
                 chunk.iter_component_indices(&self.timeline, &self.component_name),
                 chunk.iter_primitive_array::<N, T>(&self.component_name)
@@ -475,7 +475,7 @@ impl<'a> HybridResultsChunkIter<'a> {
     where
         [T; N]: bytemuck::Pod,
     {
-        self.chunks.iter().flat_map(move |chunk| {
+        self.chunks.iter().filter(|_| false).flat_map(move |chunk| {
             itertools::izip!(
                 chunk.iter_component_indices(&self.timeline, &self.component_name),
                 chunk.iter_primitive_array_list::<N, T>(&self.component_name)
@@ -489,7 +489,7 @@ impl<'a> HybridResultsChunkIter<'a> {
     pub fn string(
         &'a self,
     ) -> impl Iterator<Item = ((TimeInt, RowId), Vec<re_types_core::ArrowString>)> + 'a {
-        self.chunks.iter().flat_map(|chunk| {
+        self.chunks.iter().filter(|_| false).flat_map(|chunk| {
             itertools::izip!(
                 chunk.iter_component_indices(&self.timeline, &self.component_name),
                 chunk.iter_string(&self.component_name)
@@ -503,7 +503,7 @@ impl<'a> HybridResultsChunkIter<'a> {
     pub fn buffer<T: arrow::datatypes::ArrowNativeType + arrow2::types::NativeType>(
         &'a self,
     ) -> impl Iterator<Item = ((TimeInt, RowId), Vec<re_types_core::ArrowBuffer<T>>)> + 'a {
-        self.chunks.iter().flat_map(|chunk| {
+        self.chunks.iter().filter(|_| false).flat_map(|chunk| {
             itertools::izip!(
                 chunk.iter_component_indices(&self.timeline, &self.component_name),
                 chunk.iter_buffer(&self.component_name)
diff --git a/crates/viewer/re_space_view_spatial/src/visualizers/utilities/entity_iterator.rs b/crates/viewer/re_space_view_spatial/src/visualizers/utilities/entity_iterator.rs
index 0e535138677..3ba2cd3a26f 100644
--- a/crates/viewer/re_space_view_spatial/src/visualizers/utilities/entity_iterator.rs
+++ b/crates/viewer/re_space_view_spatial/src/visualizers/utilities/entity_iterator.rs
@@ -141,7 +141,7 @@ pub fn iter_component<'a, C: re_types::Component>(
     timeline: Timeline,
     component_name: ComponentName,
 ) -> impl Iterator<Item = ((TimeInt, RowId), ChunkComponentIterItem<C>)> + 'a {
-    chunks.iter().flat_map(move |chunk| {
+    chunks.iter().filter(|_| false).flat_map(move |chunk| {
         itertools::izip!(
             chunk.iter_component_indices(&timeline, &component_name),
             chunk.iter_component::<C>()
@@ -158,7 +158,7 @@ pub fn iter_primitive<'a, T: arrow2::types::NativeType>(
     timeline: Timeline,
     component_name: ComponentName,
 ) -> impl Iterator<Item = ((TimeInt, RowId), &'a [T])> + 'a {
-    chunks.iter().flat_map(move |chunk| {
+    chunks.iter().filter(|_| false).flat_map(move |chunk| {
         itertools::izip!(
             chunk.iter_component_indices(&timeline, &component_name),
             chunk.iter_primitive::<T>(&component_name)
@@ -178,7 +178,7 @@ pub fn iter_primitive_array<'a, const N: usize, T: arrow2::types::NativeType>(
 where
     [T; N]: bytemuck::Pod,
 {
-    chunks.iter().flat_map(move |chunk| {
+    chunks.iter().filter(|_| false).flat_map(move |chunk| {
         itertools::izip!(
             chunk.iter_component_indices(&timeline, &component_name),
             chunk.iter_primitive_array::<N, T>(&component_name)
@@ -198,7 +198,7 @@ pub fn iter_primitive_array_list<'a, const N: usize, T: arrow2::types::NativeTyp
 where
     [T; N]: bytemuck::Pod,
 {
-    chunks.iter().flat_map(move |chunk| {
+    chunks.iter().filter(|_| false).flat_map(move |chunk| {
         itertools::izip!(
             chunk.iter_component_indices(&timeline, &component_name),
             chunk.iter_primitive_array_list::<N, T>(&component_name)
@@ -215,7 +215,7 @@ pub fn iter_string<'a>(
     timeline: Timeline,
     component_name: ComponentName,
 ) -> impl Iterator<Item = ((TimeInt, RowId), Vec<re_types::ArrowString>)> + 'a {
-    chunks.iter().flat_map(move |chunk| {
+    chunks.iter().filter(|_| false).flat_map(move |chunk| {
         itertools::izip!(
             chunk.iter_component_indices(&timeline, &component_name),
             chunk.iter_string(&component_name)
@@ -232,7 +232,7 @@ pub fn iter_buffer<'a, T: arrow::datatypes::ArrowNativeType + arrow2::types::Nat
     timeline: Timeline,
     component_name: ComponentName,
 ) -> impl Iterator<Item = ((TimeInt, RowId), Vec<re_types::ArrowBuffer<T>>)> + 'a {
-    chunks.iter().flat_map(move |chunk| {
+    chunks.iter().filter(|_| false).flat_map(move |chunk| {
         itertools::izip!(
             chunk.iter_component_indices(&timeline, &component_name),
             chunk.iter_buffer(&component_name)

First, let's look at unmodified main (on my machine with discrete GPU, i.e. hard mode):

main, latest-at, without plot:

main, latest-at, with plot:

main, infinite range for 3D view / latest-at for the rest, without plot:

main, infinite range for 3D view / latest-at for the rest, with plot:

Now here's where it gets interesting: consider what happens

main, latest-at, without plot -- Chunk processing only: (NOTE: The specific values is the flamegraph are inflated due to probing overhead).

main, infinite range for 3D view / latest-at for the rest, without plot -- Chunk processing only:

It looks like we can definitely afford to run the queries every frame, as long as we manage to make aggregated caching work.

Dec 02 '24 10:12 teh-cmc

This is a very nice, user-provided real-world benchmark we can use to drive these optimizations forward (auth required): https://drive.google.com/file/d/1ESejr2yAhEjyKYFD2u-wQanckK8OwzrN/view?usp=sharing

Jan 15 '25 14:01 teh-cmc

I'm really struggling because of this issue:

Feb 22 '25 12:02 rasmusgo

oh interesting, didn't realize that this one still bubbles up, thought we weeded it out! Which Rerun version and can you tell more about your scene, I figure lots of entities with images?

Feb 22 '25 13:02 Wumpf

Rerun 0.22.1. The scene is about solving a puzzle with 500 pieces from photos. I'm dumping a few images but mostly lots of 2D points and lines from processing the calibration pattern and contours of puzzle pieces.

Feb 26 '25 10:02 rasmusgo

I invited you and the rerun org to https://github.com/rasmusgo/puzzle_bot if you want to take a look. I also uploaded a recording to google drive.

Feb 26 '25 10:02 rasmusgo

awesome, thank you! This week is a bit packed, I'll try to have a browse next week, surely this must be something silly in the viewer that can't be hard to fix 😄 🤞

Feb 26 '25 22:02 Wumpf

Any progress on this?

Mar 17 '25 09:03 rasmusgo

@rasmusgo sorry for the long delay and thanks for the ping (had this on my radar but kept postponing :/)! There was indeed a lot of sillyness, addressed here

Mar 17 '25 17:03 Wumpf

Tagging PR #11743 for posterity.

Nov 11 '25 11:11 joelreymont

Project: Performant visualization of scenes with large number of entities

Context

Measurable end goals

Air Traffic example (2h dataset)

Revy

Relevant material

Areas that need significant improvements

Air Traffic example (`2h` dataset)