rerun Parallel ingestion

Redesign ingestion with dedicated ingestion worker

Addresses #4298 by moving CPU-intensive Arrow message processing off the UI thread.

Key changes:

Ingestion Worker (crates/viewer/re_viewer/src/ingestion_worker.rs):
- Dedicated background thread for processing Arrow messages into chunks
- Bounded channel (capacity: 2000) provides natural backpressure
Viewer Integration (crates/viewer/re_viewer/src/app.rs):
- Arrow messages submitted to worker queue instead of blocking UI thread
- Poll worker output during update() and add chunks to EntityDb

Nov 05 '25 18:11 joelreymont

Can you please rebase your PR? Seems it's based on a rather old commit from main. Notably arrow has been updated since.

Nov 06 '25 08:11 abey79

Can you please rebase your PR? Seems it's based on a rather old commit from main. Notably arrow has been updated since.

Done!

Nov 06 '25 08:11 joelreymont

Can you please rebase your PR? Seems it's based on a rather old commit from main. Notably arrow has been updated since.

Done!

I'm still seeing your fork's main being quite a few commits behind ours.

Nov 06 '25 09:11 abey79

Mea culpa! Should be good now.

Also, I'm available on Discord if you have any questions.

Nov 06 '25 09:11 joelreymont

I'm observing a fairly large reduction of the "time to fully loaded" with a 2h air traffic data RRD: 30s -> 9s.

`main`

https://github.com/user-attachments/assets/48c36fc3-9152-40ae-8c43-52114dcde426

this PR

https://github.com/user-attachments/assets/9204d6c6-aeb5-4a80-9f6d-b0c08ef94176

Nov 06 '25 09:11 abey79

Based on the above benchmark, could you provide a more thorough analysis of the performance profile of this PR with Puffin?

You should be able to produce the rrd using pixi run -e examples air_traffic_data --dataset 2h --save output.rrd (I can also send mine if needed).

Nov 06 '25 10:11 abey79

CleanShot 2025-11-06 at 13 01 07@2x

The compressed Puffin file is here.

Nov 06 '25 11:11 joelreymont

I'm working on it!

Nov 06 '25 16:11 joelreymont

This is what the old ingestion performance looks like, using the floating metrics window from PR #11820 against the main branch.

https://github.com/user-attachments/assets/212cc794-db54-43d1-bb36-dc056d98dd5a

Time elapsed is ~65s to load the air traffic 2h data set. Throughput is 2017msg/s.

And this is what new ingestion performance looks like, using the same floating metrics window, adapted to parallel processing and included in this PR.

https://github.com/user-attachments/assets/559afd09-16bf-4f16-b0dc-de6e3f0e6ef7

You can see that ingestion time went down to ~12s and throughput went up >5x.

Nov 07 '25 16:11 joelreymont

@abey79 This confirms your findings from yesterday. It also confirms that performance did not decrease with the redesign.

Nov 07 '25 16:11 joelreymont

Before:

App ──> knows about workers
 └──> manually iterates EntityDbs
      └──> calls poll_worker_output()
           └──> processes results

After:

App ──> store_hub.on_frame_start()
          └──> store_bundle.on_frame_start()
                └──> entity_db.on_frame_start() (for each)
                      └──> worker.poll_processed_chunks() ✅ Encapsulated!

The metrics are key to the whole thing, in my opinion, as is performance tracking. I don't know how to tell if the PR provides true performance enhancements without metrics!

I kept the metrics as the HEAD so I could drop them from this PR and create another one on top of this same branch.

Nov 10 '25 19:11 joelreymont

The metrics are key to the whole thing, in my opinion, as is performance tracking. I don't know how to tell if the PR provides true performance enhancements without metrics!

I kept the metrics as the HEAD so I could drop them from this PR and create another one on top of this same branch.

I'm not saying metrics are not useful, and it's great to have them to document the performance increase. But I cannot review this PR, let alone land it, unless the diff only covers the actual ingestion part. Using git, you should be able to rebase the metrics UI stuff on top of this branch, such that you can use it locally without it appearing here.

Nov 11 '25 11:11 abey79

Cut the head off, please check!

Nov 11 '25 11:11 joelreymont

Should be ready to go!

Nov 25 '25 07:11 joelreymont