celeritas icon indicating copy to clipboard operation
celeritas copied to clipboard

Asynchronously run a step without waiting for thrust algorithms

Open sethrj opened this issue 2 months ago • 0 comments

Currently our core step algorithms have several "device synchronize" due to thrust algorithms:

  • [ ] Determining the number of active tracks (celeritas/track/detail/TrackInitAlgorithms.cu)
  • [ ] Determining the location of vacancies to fill (celeritas/track/detail/TrackInitAlgorithms.cu)

And in the optical loop similarly:

  • [ ] Counting the number of pending tracks (celeritas/optical/gen/detail/OffloadAlgorithms.cu)
  • [ ] Removing empty pending distributions (celeritas/optical/gen/detail/OffloadAlgorithms.cu)
  • [ ] Compacting the TrackSlotIds of the inactive tracks (celeritas/optical/action/detail/TrackInitAlgorithms.cu)
  • [ ] Partitioning those vacancies among the generators (celeritas/optical/gen/detail/GeneratorAlgorithms.cu)

This is in addition to the step collection/hit algorithms.

All of these will have to be replaced with algorithms that can write to a device memory location, and dependencies on that data will have to remain on device (rather than be managed on CPU). Quantities such as the number of active/alive tracks can be asynchronously copied to host memory and picked up at the start of the next step (or deferred to later if we're ok with running an empty step).

@LSchwiebert

sethrj avatar Oct 07 '25 18:10 sethrj