celeritas icon indicating copy to clipboard operation
celeritas copied to clipboard

Add Ranluxpp Random Number Generator

Open davidsgr opened this issue 2 months ago • 10 comments

Adds the new Ranluxpp random number engine to the Celeritas codebase conformant with the interface required for the generators.

davidsgr avatar Oct 02 '25 20:10 davidsgr

Test summary

 5 722 files   9 188 suites   17m 57s :stopwatch:  2 085 tests  2 059 :white_check_mark:  26 :zzz: 0 :x: 31 923 runs  31 775 :white_check_mark: 148 :zzz: 0 :x:

Results for commit b540f702.

:recycle: This comment has been updated with latest results.

github-actions[bot] avatar Oct 02 '25 20:10 github-actions[bot]

@davidsgr I see an IWYU error causing a build failure: https://github.com/celeritas-project/celeritas/actions/runs/18719387226/job/53387353724?pr=2001

missing <algorithm> I think. Could you ensure all the CI tests pass while I review this morning? Thanks!

sethrj avatar Oct 27 '25 12:10 sethrj

@davidsgr Since the last review it looks like you've moved a lot of stuff in to .cc files: this unfortunately will not work in Celeritas. All code that's used inside a kernel has to be inlined. Setup code should be in .cc files where it's not needed for performance, but anything runtime has to be available to the downstream .cu kernel. To fix this, you'll want to make sure your class definitions declare all their methods as inline, then move all the code from the .cc file to the bottom of the .hh file.

Something to keep an eye out for: if you see a CELER_FUNCTION annotation in a .cc file, that's an error since it's declaring the function __host__ __device__ which is only meaningful for CUDA/HIP.

sethrj avatar Oct 28 '25 16:10 sethrj

Still working through this. Sorry if some of the comments seem like flip flopping: do you want to get together and go over some of this in person?

Yeah, we can go over this. I'm free all day today.

davidsgr avatar Oct 29 '25 14:10 davidsgr

Codecov Report

:x: Patch coverage is 94.41341% with 20 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 84.94%. Comparing base (8d0d904) to head (b540f70). :warning: Report is 8 commits behind head on develop.

Files with missing lines Patch % Lines
src/corecel/random/engine/RanluxppRngEngine.hh 81.94% 5 Missing and 8 partials :warning:
...corecel/random/data/detail/RanluxppRngStateInit.hh 72.72% 3 Missing :warning:
src/corecel/random/data/RanluxppRngData.hh 85.71% 0 Missing and 2 partials :warning:
src/corecel/random/engine/detail/RanluxppImpl.hh 99.56% 0 Missing and 1 partial :warning:
src/corecel/random/params/RanluxppRngParams.hh 50.00% 1 Missing :warning:
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2001      +/-   ##
===========================================
+ Coverage    84.86%   84.94%   +0.07%     
===========================================
  Files         1255     1263       +8     
  Lines        44186    44487     +301     
  Branches     16504    16545      +41     
===========================================
+ Hits         37499    37789     +290     
+ Misses        4726     4722       -4     
- Partials      1961     1976      +15     
Files with missing lines Coverage Δ
src/corecel/data/Collection.hh 93.75% <ø> (ø)
src/corecel/random/data/RanluxppRngData.cc 100.00% <100.00%> (ø)
src/corecel/random/data/RanluxppTypes.hh 100.00% <100.00%> (ø)
src/corecel/random/data/XorwowRngData.cc 96.55% <ø> (ø)
...corecel/random/data/detail/RanluxppRngStateInit.cc 100.00% <100.00%> (ø)
src/corecel/random/engine/XorwowRngEngine.hh 98.70% <ø> (ø)
src/corecel/random/params/RanluxppRngParams.cc 100.00% <100.00%> (ø)
src/corecel/random/engine/detail/RanluxppImpl.hh 99.56% <99.56%> (ø)
src/corecel/random/params/RanluxppRngParams.hh 50.00% <50.00%> (ø)
src/corecel/random/data/RanluxppRngData.hh 85.71% <85.71%> (ø)
... and 2 more

... and 41 files with indirect coverage changes

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Oct 29 '25 15:10 codecov[bot]

@sethrj I think this is ready for another review. The remaining CI failures do not appear to have anything to do with my PR.

davidsgr avatar Nov 11 '25 16:11 davidsgr

@davidsgr Did you see the comments above? Not sure if you're waiting on another review and there are several comments unresolved. I'll merge once they're addressed. Regarding the cmake, have you tried building and running all tests with the core RNG set to that? (I expect some of the tests will fail despite our being careful to guard against our assumption about the core RNG.)

sethrj avatar Nov 18 '25 21:11 sethrj

@davidsgr Did you see the comments above? Not sure if you're waiting on another review and there are several comments unresolved. I'll merge once they're addressed. Regarding the cmake, have you tried building and running all tests with the core RNG set to that? (I expect some of the tests will fail despite our being careful to guard against our assumption about the core RNG.)

I'll start working through these comments. I'm still getting used to the GitHub interface. Not sure why I didn't see these before.

davidsgr avatar Nov 19 '25 12:11 davidsgr

@davidsgr I took the liberty of adding some documentation (and moving the Impl file to engine/detail where it more closely belongs). Please git pull --rebase on your local branch!

sethrj avatar Nov 21 '25 01:11 sethrj

Future performance improvement work for the RNG (if it ends up being expensive) should focus on register occupancy, since the current implementation has a lot of large temporary variables:

  • In-place conversion between ranlux and lcg spaces
  • Temporary async allocation of scratch space for the large multiplies
  • Add a LCG impl view that references the global memory state, and stores temporary values like the carry int as register

sethrj avatar Nov 21 '25 14:11 sethrj