rerun icon indicating copy to clipboard operation
rerun copied to clipboard

Parallel ingestion

Open joelreymont opened this issue 1 month ago • 13 comments

Redesign ingestion with dedicated ingestion worker

Addresses #4298 by moving CPU-intensive Arrow message processing off the UI thread.

Key changes:

  1. Ingestion Worker (crates/viewer/re_viewer/src/ingestion_worker.rs):

    • Dedicated background thread for processing Arrow messages into chunks
    • Bounded channel (capacity: 2000) provides natural backpressure
  2. Viewer Integration (crates/viewer/re_viewer/src/app.rs):

    • Arrow messages submitted to worker queue instead of blocking UI thread
    • Poll worker output during update() and add chunks to EntityDb

joelreymont avatar Nov 05 '25 18:11 joelreymont

Can you please rebase your PR? Seems it's based on a rather old commit from main. Notably arrow has been updated since.

abey79 avatar Nov 06 '25 08:11 abey79

Can you please rebase your PR? Seems it's based on a rather old commit from main. Notably arrow has been updated since.

Done!

joelreymont avatar Nov 06 '25 08:11 joelreymont

Can you please rebase your PR? Seems it's based on a rather old commit from main. Notably arrow has been updated since.

Done!

I'm still seeing your fork's main being quite a few commits behind ours.

abey79 avatar Nov 06 '25 09:11 abey79

Mea culpa! Should be good now.

Also, I'm available on Discord if you have any questions.

joelreymont avatar Nov 06 '25 09:11 joelreymont

I'm observing a fairly large reduction of the "time to fully loaded" with a 2h air traffic data RRD: 30s -> 9s.

main

https://github.com/user-attachments/assets/48c36fc3-9152-40ae-8c43-52114dcde426

this PR

https://github.com/user-attachments/assets/9204d6c6-aeb5-4a80-9f6d-b0c08ef94176

abey79 avatar Nov 06 '25 09:11 abey79

Based on the above benchmark, could you provide a more thorough analysis of the performance profile of this PR with Puffin?

You should be able to produce the rrd using pixi run -e examples air_traffic_data --dataset 2h --save output.rrd (I can also send mine if needed).

abey79 avatar Nov 06 '25 10:11 abey79

CleanShot 2025-11-06 at 13 01 07@2x CleanShot 2025-11-06 at 13 01 30@2x

The compressed Puffin file is here.

joelreymont avatar Nov 06 '25 11:11 joelreymont

I'm working on it!

joelreymont avatar Nov 06 '25 16:11 joelreymont

This is what the old ingestion performance looks like, using the floating metrics window from PR #11820 against the main branch.

https://github.com/user-attachments/assets/212cc794-db54-43d1-bb36-dc056d98dd5a

Time elapsed is ~65s to load the air traffic 2h data set. Throughput is 2017msg/s.

sequential

And this is what new ingestion performance looks like, using the same floating metrics window, adapted to parallel processing and included in this PR.

https://github.com/user-attachments/assets/559afd09-16bf-4f16-b0dc-de6e3f0e6ef7

You can see that ingestion time went down to ~12s and throughput went up >5x.

parallel

joelreymont avatar Nov 07 '25 16:11 joelreymont

@abey79 This confirms your findings from yesterday. It also confirms that performance did not decrease with the redesign.

joelreymont avatar Nov 07 '25 16:11 joelreymont

Before:

App ──> knows about workers
 └──> manually iterates EntityDbs
      └──> calls poll_worker_output()
           └──> processes results

After:

App ──> store_hub.on_frame_start()
          └──> store_bundle.on_frame_start()
                └──> entity_db.on_frame_start() (for each)
                      └──> worker.poll_processed_chunks() ✅ Encapsulated!

The metrics are key to the whole thing, in my opinion, as is performance tracking. I don't know how to tell if the PR provides true performance enhancements without metrics!

I kept the metrics as the HEAD so I could drop them from this PR and create another one on top of this same branch.

joelreymont avatar Nov 10 '25 19:11 joelreymont

The metrics are key to the whole thing, in my opinion, as is performance tracking. I don't know how to tell if the PR provides true performance enhancements without metrics!

I kept the metrics as the HEAD so I could drop them from this PR and create another one on top of this same branch.

I'm not saying metrics are not useful, and it's great to have them to document the performance increase. But I cannot review this PR, let alone land it, unless the diff only covers the actual ingestion part. Using git, you should be able to rebase the metrics UI stuff on top of this branch, such that you can use it locally without it appearing here.

abey79 avatar Nov 11 '25 11:11 abey79

Cut the head off, please check!

joelreymont avatar Nov 11 '25 11:11 joelreymont

Should be ready to go!

joelreymont avatar Nov 25 '25 07:11 joelreymont