celeritas icon indicating copy to clipboard operation
celeritas copied to clipboard

Fix all unit tests for both surface and solids models

Open mrguilima opened this issue 8 months ago • 2 comments

Creating this (draft) PR as suggested yesterday at the standup meeting.

I tried today to merge in the develop branch, and then it does not compile. However, this was passing all tests (including orange, e.g. VecGeom_DIR undefined).

For surface model, tested against VecGeom commit c3ee06912. For solids model, tested agains VecGeom commit 7b8b07ba4 (then the head of v1-patches).

All tests also build ok for solids model, against the same commit above used for the surface model (c3ee06912), however the solids model fail several tests. Overall, the solids model currently looks better (e.g. has better agreement agains surface model and Orange) using v1-patches than using VecGeom's master branch.

mrguilima avatar Apr 04 '25 21:04 mrguilima

Updating test results here, after recent updates in VecGeom (v2.x = commit 14cd3a877 in branch 'stable_grazing') and in Celeritas (commit 6ab096711 in branch vecgeom-2.0):

Using Orange: 100% tests passed, 0 tests failed out of 300 (143.78 sec)

VecGeom v1.2.10: 99% tests passed, 3 tests failed out of 327 (123.00 sec) <-- solid model only

The following tests FAILED: 204 - celeritas/em/UrbanMsc (Failed) unit 205 - celeritas/em/WentzelVIMsc (Failed) unit 215 - celeritas/ext/RootJsonDumper (Failed) unit

VecGeom v1.x (head of v1-patches): 100% tests passed, 0 tests failed out of 327 (145.72 sec) <-- solid model only

Using vecgeom-2.x-surf: 99% tests passed, 2 tets failed out of 324 (155.67 sec) <-- surface model

The following tests FAILED: 251 - celeritas/global/StepperGeant:TestEm3MscNofluct.* (Failed) gpu unit 252 - celeritas/global/StepperGeant:TestEm3MscNoIntegral.* (Failed) gpu unit

VecGeom v2.x-solid:

The following tests FAILED: 89 - geocel/vg/Vecgeom:SimpleCmsVgdmlTest.* (Failed) gpu unit 97 - geocel/vg/Vecgeom:ReplicaTest.* (Failed) gpu unit 204 - celeritas/em/UrbanMsc (Failed) unit 205 - celeritas/em/WentzelVIMsc (Failed) unit 215 - celeritas/ext/RootJsonDumper (Failed) unit 224 - celeritas/field/FieldPropagator:-Layers*:SimpleCms*:Cmse* (Failed) unit 226 - celeritas/field/FieldPropagator:SimpleCms* (Failed) unit 227 - celeritas/field/FieldPropagator:Cmse* (Failed) unit 230 - celeritas/geo/Geometry:SimpleCms* (Failed) gpu unit 231 - celeritas/geo/Geometry:TestEm3* (Failed) gpu unit 232 - celeritas/geo/Geometry:ThreeSpheres* (Failed) gpu unit 251 - celeritas/global/StepperGeant:TestEm3Compton.* (Failed) gpu unit 252 - celeritas/global/StepperGeant:TestEm3NoMsc.* (Failed) gpu unit 253 - celeritas/global/StepperGeant:TestEm3Msc.* (Failed) gpu unit 254 - celeritas/global/StepperGeant:TestEm3MscNofluct.* (Failed) gpu unit 304 - celeritas/track/TrackSort (Failed) gpu unit 305 - celeritas/track/TrackInit (Failed) gpu unit 309 - celeritas/user/SlotDiagnostic (Failed) gpu unit 312 - celeritas/user/StepCollector:TestMultiEm3* (Failed) gpu unit 322 - app/celer-g4:gpu (Timeout) app gpu nomemcheck

mrguilima avatar Apr 14 '25 20:04 mrguilima

Test summary

 5 672 files   9 038 suites   17m 53s :stopwatch:  2 072 tests  2 046 :white_check_mark:  26 :zzz: 0 :x: 31 731 runs  31 584 :white_check_mark: 147 :zzz: 0 :x:

Results for commit 2bc50c9a.

:recycle: This comment has been updated with latest results.

github-actions[bot] avatar Apr 14 '25 21:04 github-actions[bot]

@mrguilima Where will this be by the end of today? If necessary we can still leave the 2.0/surface tests disabled but we can't keep wasting time resolving conflicts every week. Please tell me your plans and schedule for the rest of this.

sethrj avatar Oct 27 '25 12:10 sethrj

@mrguilima Thanks for continuing on this. You're right that there are a lot of combinations. What about we just try to simply fix the celeritas/field/FieldPropagator:Cmse* test for vecgeom 2 solid, which seems to not be exiting the boundary correctly:

warning: Moved internally from boundary but safety didn't increase: volume 18 from {10.31609436324, -6.5649478629707, 796.92284229006} to {10.316160010118, -6.5649741252439, 796.92291300626} (distance: 9.999999997693e-05)
warning: Moved internally from boundary but safety didn't increase: volume 19 from {99.994867290585, -1.0131710311187, 6585.456544248} to {99.74728429231, 4.2955842461032, 6590.7760977579} (distance: 7.519430129917)
warning: Moved internally from boundary but safety didn't increase: volume 19 from {99.74728429231, 4.2955842461032, 6590.7760977579} to {98.706755024331, 9.5072518029854, 6596.0956512728} (distance: 7.5194301299217)
warning: Moved internally from boundary but safety didn't increase: volume 19 from {98.706755024331, 9.5072518029854, 6596.0956512728} to {96.896790709203, 14.504071692219, 6601.4152047927} (distance: 7.5194301299247)

This seems to be the same behavior we're seeing on the develop branch for the slightly older 2.0. So maybe we should just leave these disabled and start a new development branch.

sethrj avatar Oct 29 '25 12:10 sethrj

Codecov Report

:x: Patch coverage is 0% with 3 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 84.86%. Comparing base (9cddfc3) to head (2bc50c9). :warning: Report is 3 commits behind head on develop.

Files with missing lines Patch % Lines
src/geocel/vg/VecgeomParams.cc 0.00% 3 Missing :warning:
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff            @@
##           develop    #1720    +/-   ##
=========================================
  Coverage    84.86%   84.86%            
=========================================
  Files         1243     1243            
  Lines        43508    43511     +3     
  Branches     16225    16379   +154     
=========================================
+ Hits         36923    36927     +4     
- Misses        4658     4833   +175     
+ Partials      1927     1751   -176     
Files with missing lines Coverage Δ
src/geocel/vg/VecgeomTrackView.hh 81.50% <ø> (ø)
src/geocel/vg/detail/BVHNavigator.hh 84.87% <ø> (ø)
src/geocel/vg/VecgeomParams.cc 65.44% <0.00%> (-0.66%) :arrow_down:

... and 128 files with indirect coverage changes

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Nov 06 '25 18:11 codecov[bot]

@pcanal do we need to add a "surface" vecgeom to codecov as well? 😅 now that it's clear we can build against both surface and solid in a single installation, I think we can include both in one build (but it'll take a bit of work).

sethrj avatar Nov 11 '25 00:11 sethrj

@mrguilima I'm running before-and-after comparisons on GPU for select problems in the benchmark suite; if there are no substantial changes for the vg1 (and if the vg2 cases don't crash) I will merge. Thanks!

sethrj avatar Nov 11 '25 14:11 sethrj

@mrguilima I'm running before-and-after comparisons on GPU for select problems in the benchmark suite; if there are no substantial changes for the vg1 (and if the vg2 cases don't crash) I will merge. Thanks!

I hope we're at a good point for merging.
Thanks for the help to get to this point.

mrguilima avatar Nov 11 '25 15:11 mrguilima

Results coming in now. Good news:

  1. VecGeom 1.x performance is unaffected by the changes.
  2. Some previously hanging vecgeom2 problems now finish in a reasonable number of step iterations.

Bad news:

  1. Vecgeom solid 2.x is three to five times slower than 1.x.
  2. Vecgeom surface 2.x still has a stuck track for CMS 2018. It's also 50% slower than 1.x

I think a major refactor of our use of vecgeom is in order. We need to avoid relocating when doing boundary checks, and we need to store the previous volume when crossing in order to correctly grab surfaces. I'll do that in a future PR.

sethrj avatar Nov 11 '25 17:11 sethrj