acts icon indicating copy to clipboard operation
acts copied to clipboard

feat: Two-way CKF example

Open timadye opened this issue 10 months ago • 9 comments

Resurrect #2717 from @andiwand, updating to the latest main, and removing earlier test (MyTrackFindingAlgorithm).

blocked by

  • https://github.com/acts-project/acts/pull/3074
  • https://github.com/acts-project/acts/pull/3075
  • https://github.com/acts-project/acts/pull/3076
  • https://github.com/acts-project/acts/pull/3077
  • https://github.com/acts-project/acts/pull/3078
  • https://github.com/acts-project/acts/pull/3086

timadye avatar Mar 27 '24 18:03 timadye

Codecov Report

Attention: Patch coverage is 25.00000% with 9 lines in your changes are missing coverage. Please review.

Project coverage is 49.02%. Comparing base (3203649) to head (3d3c37f). Report is 5 commits behind head on main.

:exclamation: Current head 3d3c37f differs from pull request most recent head d9f2302. Consider uploading reports for the commit d9f2302 to get more accurate results

Files Patch % Lines
...de/Acts/TrackFinding/CombinatorialKalmanFilter.hpp 20.00% 5 Missing and 3 partials :warning:
...e/include/Acts/TrackFitting/GainMatrixSmoother.hpp 50.00% 0 Missing and 1 partial :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3066      +/-   ##
==========================================
- Coverage   49.09%   49.02%   -0.07%     
==========================================
  Files         497      494       -3     
  Lines       29155    29061      -94     
  Branches    13851    13798      -53     
==========================================
- Hits        14314    14248      -66     
- Misses       4908     4930      +22     
+ Partials     9933     9883      -50     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Mar 27 '24 19:03 codecov[bot]

Currently this has the default twoWay = true (in TrackFindingAlgorithm.hpp and reconstruction.py). Should we make it false to keep the old behaviour, or make this the new default?

timadye avatar Mar 27 '24 19:03 timadye

Started breaking things off

  • https://github.com/acts-project/acts/pull/3073
  • https://github.com/acts-project/acts/pull/3074
  • https://github.com/acts-project/acts/pull/3075

andiwand avatar Apr 04 '24 09:04 andiwand

I think this now works as it should. The only thing which is a bit annoying is the second extrapolation which would not be necessary if the CKF would spit out parameters at the target surface.

But it might be good enough for now and we improve the Core CKF in the next iteration.

What do you think @timadye ?

andiwand avatar Apr 05 '24 16:04 andiwand

The changes look OK to me.

Unfortunately the results aren't so good running on ITk with ttbar PU200.

  1. It crashes after 81 events with error:
/home/ppd/adye/acts/build/src/Core/include/Acts/Utilities/VectorHelpers.hpp:142: std::array<double, 4> Acts::VectorHelpers::evaluateTrigonomics(const Acts::Vector3&): Assertion `sinTheta != 0 && "VectorHelpers: Vector is parallel to the z-axis " "which leads to division by zero"' failed.
  1. It has a lot more ERRORs. Here's a count of the most common:
 284888 TrackFinding   ERROR     Smoothing track 1 failed with error KalmanFitterError:3
 284888 TrackFinding   ERROR     First smoothing for seed 22 and track 1 failed with error KalmanFitterError:3
 140630 TrackFinding   ERROR     no intersection found
 140630 TrackFinding   ERROR     failed to find track state for extrapolation
  74918 TrackFinding   ERROR     Extrapolation for seed 14 and track 0 failed with error TrackExtrapolationError:2
  65712 TrackFinding   ERROR     Second extrapolation for seed 14 and track 6 failed with error TrackExtrapolationError:2
   8238 TrackFinding   ERROR     Propagation reached the step count limit of 1000 (did 1000 steps)
  1. If I set twoWay=false, it seems to hang on event 97. All other events finished, but an hour later one thread is still running with 100% CPU, and all the other threads are idle. The 8-thread process is using 64.6GB RAM.
  2. With twoWay=false, I still get quite a lot of ERRORs, but this is only about twice as many as we had before #2722, albeit different messages:
  94977 TrackFinding   ERROR     no intersection found
  94977 TrackFinding   ERROR     failed to find track state for extrapolation
  94977 TrackFinding   ERROR     Extrapolation for seed 2 and track 0 failed with error TrackExtrapolationError:2

Since it misbehaves even with twoWay=false, I think there is still a problem with other core changes, notably #2722. So maybe not specific to this PR, but needs fixing before we can see the performance implications, at least for the ITk.

timadye avatar Apr 05 '24 20:04 timadye

Thank you for the detailed investigation @timadye !

I will look into this next week.

andiwand avatar Apr 06 '24 07:04 andiwand

Just checked the latest version with the ITk, including restore first track patch. The number of ERROR messages are now similar between runs with twoWay=true and false. I still get a crash and hang as reported above.

I think that means we still have a problem running with the ITk, but it doesn't seem to be specifically associated with this PR.

timadye avatar Apr 08 '24 17:04 timadye

This seems to produce new FPEs - I will take a look

andiwand avatar Apr 09 '24 13:04 andiwand

FPE should be hopefully fixed now

andiwand avatar Apr 09 '24 16:04 andiwand

Made a quick performance comparison between https://github.com/acts-project/acts/pull/3066/commits/de9a0c915b3a7e994ac1d948593af05b23324e89 and current (main and changed respectively in the plot)

ttbar PU 200 fatras default seeding

image

Looks like the additional track copies cost about 3%

@timadye

andiwand avatar Apr 10 '24 22:04 andiwand

:red_circle: Athena integration test results [d1f4939d6a940614184ebd590986cdf6aa73be20]

:red_circle: Some tests have failed!

Please investigate the pipeline!

status job report
:green_circle: run_unit_tests
:green_circle: test_ActsEFTrackFit
:green_circle: test_ActsPersistifySeeds
:green_circle: test_ActsBenchmarkWithSpot
:green_circle: test_ActsAnalogueClustering
:red_circle: test_ActsConversionWorkflow
:green_circle: test_ActsWorkflowFastTracking
:green_circle: test_ActsWorkflow
:green_circle: test_ActsValidateAmbiguityResolution
:green_circle: test_ActsValidateResolvedTracks
:green_circle: test_ActsValidateTracks
:green_circle: test_ActsValidateActsCoreSpacePoints
:green_circle: test_ActsValidateActsSpacePoints
:green_circle: test_ActsValidateSeeds
:green_circle: test_ActsValidateOrthogonalSeeds
:green_circle: test_ActsValidateClusters
:green_circle: test_ActsPersistifyEDM
:green_circle: test_ActsGSFRefitting
:green_circle: test_ActsKfRefitting
:green_circle: test_ActsExtrapolationAlgTest
:green_circle: test_ActsITkTest
:green_circle: run_workflow_tests_run4_mc
:green_circle: run_workflow_tests_run2_mc
:green_circle: run_workflow_tests_run2_data
:red_circle: run_workflow_tests_run3_mc
:green_circle: run_workflow_tests_run3_data
:green_circle: run_art_test: test_data18_13TeV_1000evt
:green_circle: run_art_test: test_ttbarPU40_reco

acts-project-service avatar Apr 11 '24 13:04 acts-project-service

I ran the ITk ttbar+PU200 tests again (this is standalone ACTS). This still crashed (twoWay=true) or went into a loop (twoWay=false) within 100 events. As suggested before, this is probably a result of #2722 and only gives different symptoms with this PR.

timadye avatar Apr 11 '24 15:04 timadye

I will take a look @timadye

andiwand avatar Apr 11 '24 19:04 andiwand