celeritas
celeritas copied to clipboard
Optimally share GPU memory resources
Currently the number of track slots has to be set manually, and increasing the number of Geant4 CPU threads (i.e., streams) offloading to Celeritas decreases the GPU performance because the track slots are independent for each stream.
- [ ] Share a single GPU state among many CPU streams/threads (would replace https://github.com/celeritas-project/celeritas/issues/1233, since each event would correspond to a separate CPU thread)
- [ ] Spawn daughter RNGs based only on parent and seed data
- [ ] Make stepping loop fully CPU-asynchronous (no/limited thrust, see #1877)
- [ ] Use pinned-memory flag to communicate whether a new GPU step should be launched
- [ ] Automatically determine GPU memory availability, memory requirements per track slot, initializer requirements per track slot, and allocate based on that, or try expanding until memory limits hit
- [ ] Auto-expand initializers: https://github.com/celeritas-project/celeritas/issues/1058
- [ ] Use async memory pools for temporary data that only need to be alive for part of a stepping loop (e.g., surface normals, interactions, hits)
- [ ] Make configurable by CPU memory usage limits + GPU memory usage limits?
See #1877
@davidsgr This is the base issue for the RNG work we're doing; I'll split that part off into a sub-issue.