Fix all unit tests for both surface and solids models
Creating this (draft) PR as suggested yesterday at the standup meeting.
I tried today to merge in the develop branch, and then it does not compile. However, this was passing all tests (including orange, e.g. VecGeom_DIR undefined).
For surface model, tested against VecGeom commit c3ee06912. For solids model, tested agains VecGeom commit 7b8b07ba4 (then the head of v1-patches).
All tests also build ok for solids model, against the same commit above used for the surface model (c3ee06912), however the solids model fail several tests. Overall, the solids model currently looks better (e.g. has better agreement agains surface model and Orange) using v1-patches than using VecGeom's master branch.
Updating test results here, after recent updates in VecGeom (v2.x = commit 14cd3a877 in branch 'stable_grazing') and in Celeritas (commit 6ab096711 in branch vecgeom-2.0):
Using Orange: 100% tests passed, 0 tests failed out of 300 (143.78 sec)
VecGeom v1.2.10: 99% tests passed, 3 tests failed out of 327 (123.00 sec) <-- solid model only
The following tests FAILED: 204 - celeritas/em/UrbanMsc (Failed) unit 205 - celeritas/em/WentzelVIMsc (Failed) unit 215 - celeritas/ext/RootJsonDumper (Failed) unit
VecGeom v1.x (head of v1-patches): 100% tests passed, 0 tests failed out of 327 (145.72 sec) <-- solid model only
Using vecgeom-2.x-surf: 99% tests passed, 2 tets failed out of 324 (155.67 sec) <-- surface model
The following tests FAILED: 251 - celeritas/global/StepperGeant:TestEm3MscNofluct.* (Failed) gpu unit 252 - celeritas/global/StepperGeant:TestEm3MscNoIntegral.* (Failed) gpu unit
VecGeom v2.x-solid:
The following tests FAILED: 89 - geocel/vg/Vecgeom:SimpleCmsVgdmlTest.* (Failed) gpu unit 97 - geocel/vg/Vecgeom:ReplicaTest.* (Failed) gpu unit 204 - celeritas/em/UrbanMsc (Failed) unit 205 - celeritas/em/WentzelVIMsc (Failed) unit 215 - celeritas/ext/RootJsonDumper (Failed) unit 224 - celeritas/field/FieldPropagator:-Layers*:SimpleCms*:Cmse* (Failed) unit 226 - celeritas/field/FieldPropagator:SimpleCms* (Failed) unit 227 - celeritas/field/FieldPropagator:Cmse* (Failed) unit 230 - celeritas/geo/Geometry:SimpleCms* (Failed) gpu unit 231 - celeritas/geo/Geometry:TestEm3* (Failed) gpu unit 232 - celeritas/geo/Geometry:ThreeSpheres* (Failed) gpu unit 251 - celeritas/global/StepperGeant:TestEm3Compton.* (Failed) gpu unit 252 - celeritas/global/StepperGeant:TestEm3NoMsc.* (Failed) gpu unit 253 - celeritas/global/StepperGeant:TestEm3Msc.* (Failed) gpu unit 254 - celeritas/global/StepperGeant:TestEm3MscNofluct.* (Failed) gpu unit 304 - celeritas/track/TrackSort (Failed) gpu unit 305 - celeritas/track/TrackInit (Failed) gpu unit 309 - celeritas/user/SlotDiagnostic (Failed) gpu unit 312 - celeritas/user/StepCollector:TestMultiEm3* (Failed) gpu unit 322 - app/celer-g4:gpu (Timeout) app gpu nomemcheck
Test summary
5 672 files 9 038 suites 17m 53s :stopwatch: 2 072 tests 2 046 :white_check_mark: 26 :zzz: 0 :x: 31 731 runs 31 584 :white_check_mark: 147 :zzz: 0 :x:
Results for commit 2bc50c9a.
:recycle: This comment has been updated with latest results.
@mrguilima Where will this be by the end of today? If necessary we can still leave the 2.0/surface tests disabled but we can't keep wasting time resolving conflicts every week. Please tell me your plans and schedule for the rest of this.
@mrguilima Thanks for continuing on this. You're right that there are a lot of combinations. What about we just try to simply fix the celeritas/field/FieldPropagator:Cmse* test for vecgeom 2 solid, which seems to not be exiting the boundary correctly:
warning: Moved internally from boundary but safety didn't increase: volume 18 from {10.31609436324, -6.5649478629707, 796.92284229006} to {10.316160010118, -6.5649741252439, 796.92291300626} (distance: 9.999999997693e-05)
warning: Moved internally from boundary but safety didn't increase: volume 19 from {99.994867290585, -1.0131710311187, 6585.456544248} to {99.74728429231, 4.2955842461032, 6590.7760977579} (distance: 7.519430129917)
warning: Moved internally from boundary but safety didn't increase: volume 19 from {99.74728429231, 4.2955842461032, 6590.7760977579} to {98.706755024331, 9.5072518029854, 6596.0956512728} (distance: 7.5194301299217)
warning: Moved internally from boundary but safety didn't increase: volume 19 from {98.706755024331, 9.5072518029854, 6596.0956512728} to {96.896790709203, 14.504071692219, 6601.4152047927} (distance: 7.5194301299247)
This seems to be the same behavior we're seeing on the develop branch for the slightly older 2.0. So maybe we should just leave these disabled and start a new development branch.
Codecov Report
:x: Patch coverage is 0% with 3 lines in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 84.86%. Comparing base (9cddfc3) to head (2bc50c9).
:warning: Report is 3 commits behind head on develop.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| src/geocel/vg/VecgeomParams.cc | 0.00% | 3 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## develop #1720 +/- ##
=========================================
Coverage 84.86% 84.86%
=========================================
Files 1243 1243
Lines 43508 43511 +3
Branches 16225 16379 +154
=========================================
+ Hits 36923 36927 +4
- Misses 4658 4833 +175
+ Partials 1927 1751 -176
| Files with missing lines | Coverage Δ | |
|---|---|---|
| src/geocel/vg/VecgeomTrackView.hh | 81.50% <ø> (ø) |
|
| src/geocel/vg/detail/BVHNavigator.hh | 84.87% <ø> (ø) |
|
| src/geocel/vg/VecgeomParams.cc | 65.44% <0.00%> (-0.66%) |
:arrow_down: |
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
@pcanal do we need to add a "surface" vecgeom to codecov as well? 😅 now that it's clear we can build against both surface and solid in a single installation, I think we can include both in one build (but it'll take a bit of work).
@mrguilima I'm running before-and-after comparisons on GPU for select problems in the benchmark suite; if there are no substantial changes for the vg1 (and if the vg2 cases don't crash) I will merge. Thanks!
@mrguilima I'm running before-and-after comparisons on GPU for select problems in the benchmark suite; if there are no substantial changes for the vg1 (and if the vg2 cases don't crash) I will merge. Thanks!
I hope we're at a good point for merging.
Thanks for the help to get to this point.
Results coming in now. Good news:
- VecGeom 1.x performance is unaffected by the changes.
- Some previously hanging vecgeom2 problems now finish in a reasonable number of step iterations.
Bad news:
- Vecgeom solid 2.x is three to five times slower than 1.x.
- Vecgeom surface 2.x still has a stuck track for CMS 2018. It's also 50% slower than 1.x
I think a major refactor of our use of vecgeom is in order. We need to avoid relocating when doing boundary checks, and we need to store the previous volume when crossing in order to correctly grab surfaces. I'll do that in a future PR.