celeritas Optimally share GPU memory resources

Currently the number of track slots has to be set manually, and increasing the number of Geant4 CPU threads (i.e., streams) offloading to Celeritas decreases the GPU performance because the track slots are independent for each stream.

[ ] Share a single GPU state among many CPU streams/threads (would replace https://github.com/celeritas-project/celeritas/issues/1233, since each event would correspond to a separate CPU thread)
- [ ] Spawn daughter RNGs based only on parent and seed data
- [ ] Make stepping loop fully CPU-asynchronous (no/limited thrust, see #1877)
- [ ] Use pinned-memory flag to communicate whether a new GPU step should be launched
[ ] Automatically determine GPU memory availability, memory requirements per track slot, initializer requirements per track slot, and allocate based on that, or try expanding until memory limits hit
- [ ] Auto-expand initializers: https://github.com/celeritas-project/celeritas/issues/1058
[ ] Use async memory pools for temporary data that only need to be alive for part of a stepping loop (e.g., surface normals, interactions, hits)
[ ] Make configurable by CPU memory usage limits + GPU memory usage limits?

Sep 12 '25 03:09 sethrj

See #1877

Oct 08 '25 13:10 sethrj

@davidsgr This is the base issue for the RNG work we're doing; I'll split that part off into a sub-issue.

Nov 10 '25 13:11 sethrj