Valencia jets 🍊
It seemed cleanest to just open a new PR. The original description is below, I've also included the initial feedback from @Moelf.
This PR aims to implement the Valencia (VLC) jet clustering algorithm as another option to study in the context of lepton colliders. Many thanks to my summer student @EthanLynn916 for the initial julia implementation that this PR is based on!
The implementation follows the description in 1404.4294, and so depends on several parameters:
R: The jet radius parameter.
β: Corresponds to the existing power p of the algorithm.
γ: The angular exponent parameter used in the Valencia beam distance.
The extra flexibility given by the VLC algorithm provides some handles that allow for additional suppression of backgrounds due to ISR / BIB / etc. that present themselves in high-energy lepton collisions. It was originally developed for use at CLIC, and is of interest to study in the context of a future muon collider.
This PR touches EEAlgorithm to add the necessary distance functions and parameters. I think this new code is factorized and does not change the performance of the existing algorithms, but am continuing to validate the work.
In particular, some of the new tests are not yet passing due to larger differences in jet rapidity than I'd expected. I'm hopeful that https://github.com/JuliaHEP/JetReconstruction.jl/pull/198 might help bring things slightly closer to the C++ output, so am waiting to test with that merged ... in the meantime, I'm happy to get feedback on this draft PR so that it can be improved.
🍻 MLB
Codecov Report
:x: Patch coverage is 97.72727% with 3 lines in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 81.12%. Comparing base (ae0ac95) to head (c4ec5cc).
:warning: Report is 8 commits behind head on main.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| src/EEAlgorithm.jl | 97.65% | 3 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #202 +/- ##
==========================================
+ Coverage 80.22% 81.12% +0.89%
==========================================
Files 21 21
Lines 1315 1388 +73
==========================================
+ Hits 1055 1126 +71
- Misses 260 262 +2
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
Cool! I think the beam distance was not quite correct before. After expressing it differently, I now get much better closure in the test vs. fastjet.
The test does not pass at the same level of precision as the other tests, but I think they're still quite close. Please let me know if there's anything obvious that I can do to improve on this:
rap_test = isapprox(jet.rap, rap_ref; atol = 1e-4)
phi_test = isapprox(norm_phi, normalised_phi_ref; atol = 1e-4)
pt_test = isapprox(jet.pt, pt_ref; rtol = 1e-5)
I am not super happy with the introduction of ComparisonTestValencia to _common.jl, but it seemed preferable to changing the footprint of ComparisonTest to take the extra arguments ...
In any case I'd say this is now ready for people to take a look!
Hello @mattleblanc
Great stuff - I am very happy to see this more background resistant e+e- algorithm here.
A quick test does show a significant regression for the speed of the other e+e- algorithms, viz., from main:
➜ examples git:(main) ✗ julia --project instrumented-jetreco.jl --algorithm=Durham ../test/data/events.eeH.hepmc3.zst -m 64
Processed 100 events 64 times
Average time per event 56.561605937499984 ± 15.908230984544156 μs
Lowest time per event 38.716570000000004 μs
➜ examples git:(main) ✗ julia --project instrumented-jetreco.jl --algorithm=EEKt -p 1 -R 1 ../test/data/events.eeH.hepmc3.zst -m 64
Processed 100 events 64 times
Average time per event 53.38911937499999 ± 17.78962680436358 μs
Lowest time per event 41.09395000000001 μs
to this PR:
➜ examples git:(c0b228b) ✗ julia --project instrumented-jetreco.jl --algorithm=Durham ../test/data/events.eeH.hepmc3.zst -m 64
Processed 100 events 64 times
Average time per event 76.76526390624997 ± 13.610044399883003 μs
Lowest time per event 61.225550000000005 μs
➜ examples git:(c0b228b) ✗ julia --project instrumented-jetreco.jl --algorithm=EEKt -p 1 -R 1 ../test/data/events.eeH.hepmc3.zst -m 64
Processed 100 events 64 times
Average time per event 76.19724343749999 ± 13.723145317291415 μs
Lowest time per event 65.72833 μs
Which is about x1.58 slower (it's on an old i7 MacBook Pro).
It's also a bit concerning why the results are only good at the $10^{-4}$ level, which is x100 worse than with other algorithms. We should try and see why that is - do we have some sensitive non-optimal cancellations in the maths? There would be a good case here for trying Anton's PrecisionCarriers.jl package (https://indico.cern.ch/event/1488852/timetable/#88-precisioncarriersjl-easy-de).
@graeme-a-stewart Thanks for pointing out that regression -- I've been running the benchmarks over the weekend, and on my machine (M3 macbook air) the Durham and EEKt performance now ~matches the state before this PR. I see the following:
Main:
m3leblanc:examples mleblanc$ julia --project=/Users/mleblanc/dev/baseline/JetReconstruction.jl/examples/ --startup-file=no /Users/mleblanc/dev/baseline/JetReconstruction.jl/examples/instrumented-jetreco.jl --algorithm=Durham /Users/mleblanc/dev/durham/JetReconstruction.jl/test/data/events.eeH.hepm
c3.zst -m 640
Precompiling JetReconstruction...
1 dependency successfully precompiled in 2 seconds. 66 already precompiled.
Processed 100 events 640 times
Average time per event 22.380134656249982 ± 1.3076253982945982 μs
Lowest time per event 20.94416 μs
This PR:
m3leblanc:JetReconstruction.jl mleblanc$ julia --project=/Users/mleblanc/dev/durham/JetReconstruction.jl/examples/ --startup-file=no /Users/mleblanc/dev/durham/JetReconstruction.jl/examples/instrumented-jetreco.jl --algorithm=Durham /Users/mleblanc/dev/durham/JetReconstruction.jl/test/data/events.eeH.hepmc3.zst -m 640 -p=1.0
Precompiling JetReconstruction...
1 dependency successfully precompiled in 2 seconds. 66 already precompiled.
Processed 100 events 640 times
Average time per event 23.147814531249995 ± 2.522429492157901 μs
Lowest time per event 20.16833 μs
For completeness, the current VLC benchmark is consistently a bit slower, but not too bad:
m3leblanc:JetReconstruction.jl mleblanc$ julia --project=/Users/mleblanc/dev/durham/JetReconstruction.jl/examples/ --startup-file=no /Users/mleblanc/dev/durham/JetReconstruction.jl/examples/instrumented-jetreco.jl --algorithm=Valencia /Users/mleblanc/dev/durham/JetReconstruction.jl/test/data/events.eeH.hepmc3.zst -m 640 -p=1.0
Processed 100 events 640 times
Average time per event 27.71035881249998 ± 2.689635681328953 μs
Lowest time per event 24.585 μs
I don't love the duplication of code that these changes introduce when selecting functions based on ::Val{JetAlgorithm.), so I'm happy to try another approach if you'd prefer that and have a suggestion.
Hi all -- thanks for the feedback. I had tried out the Val-based approach after one of the initial points from @Moelf, but I agree that was not the right path to follow. I have implemented everything related to VLC into EEAlgorithm.jl in a way that now more closely follows the original implementation.
I have also been running the benchmarking as I have gone: I do not believe that this implementation should be significantly slower for Durham & EEKt than the existing version. The VLC implementation is a bit slower than the benchmark I posted above, but I would prefer at this point to try moving forward with this MR and to optimise things further in the future.
# Valencia
Processed 100 events 640 times
Average time per event 46.44745081249997 ± 2.30957344599381 μs
Lowest time per event 44.26292 μs
# Durham
Processed 100 events 640 times
Average time per event 23.304417093749965 ± 1.6160987724851137 μs
Lowest time per event 21.285 μs
Performance definitely got a lot better now, but we're still losing about 10% in the Durham algorithm.
There are a lot of if algorithm == decision points now, which are all quite deep in loops. Even removing unneeded checks I can't recover the last 5-7% of performance. The function signatures got a little more complex as well.
It may be that we need a refactorisation that separates Valencia at a higher level, so that in all of the tighter loops the flow path has less branching.
Performance definitely got a lot better now, but we're still losing about 10% in the Durham algorithm.
There are a lot of
if algorithm ==decision points now, which are all quite deep in loops. Even removing unneeded checks I can't recover the last 5-7% of performance. The function signatures got a little more complex as well.It may be that we need a refactorisation that separates Valencia at a higher level, so that in all of the tighter loops the flow path has less branching.
Hi @graeme-a-stewart, thanks for the quick feedback. I've made the simple changes, but I am puzzled by your comments about performance. As above, I see consistent performance between an unmodified version of the code. Profiling an unmodified version at commit eb63e6c17dbe905c275a82dabd779174a7267cd9, I see this:
m3leblanc:JetReconstruction.jl mleblanc$ julia --project=examples/ examples/instrumented-jetreco.jl test/dat
a/events.eeH.hepmc3.zst -m 1000 --algorithm=Durham
Processed 100 events 1000 times
Average time per event 22.186680800000023 ± 1.5031461324091082 μs
Lowest time per event 20.755 μs
Happy to make more changes, but I just want to check whether this is consistent with the way you're doing things before changing anything major.
are we bench marking on same architecture? I fear this is one of thus x86_64 vs. macOS M-series problem
FWIW, on an AMD Zen4:
# this PR
(vlc)> julia --project=examples/ ./examples/instrumented-jetreco.jl test/data/events.eeH.hepmc3.zst -m 1000 --algorithm=Durham
Processed 100 events 1000 times
Average time per event 29.74854201999998 ± 2.9077949972547454 μs
Lowest time per event 27.48308 μs
# main
(main)> julia --project=examples/ ./examples/instrumented-jetreco.jl test/data/events.eeH.hepmc3.zst -m 1000 --algorithm=Durham
Processed 100 events 1000 times
Average time per event 29.813233310000008 ± 4.091690424744847 μs
Lowest time per event 26.534920000000003 μs
Hi @graeme-a-stewart, thanks for the quick feedback. I've made the simple changes, but I am puzzled by your comments about performance. As above, I see consistent performance between an unmodified version of the code. Profiling an unmodified version at commit eb63e6c, I see this:
m3leblanc:JetReconstruction.jl mleblanc$ julia --project=examples/ examples/instrumented-jetreco.jl test/dat a/events.eeH.hepmc3.zst -m 1000 --algorithm=Durham Processed 100 events 1000 times Average time per event 22.186680800000023 ± 1.5031461324091082 μs Lowest time per event 20.755 μsHappy to make more changes, but I just want to check whether this is consistent with the way you're doing things before changing anything major.
Thanks @mattleblanc
What do you get comparing the head of your branch with the head of main? I am seeing ~8-10% regression, but it's an old x86 MacBook Pro I'm using.
Tomorrow I'll test on other platforms - maybe this is just an oddity...
I'm running on an M3 macbook air, I see the following after rebasing my reference setup with the latest comments on main (up to d3c97f392354e860f6f3e9c6d61b949f087401d6): it's not obvious that the performance changed.
m3leblanc:JetReconstruction.jl mleblanc$ julia --project=examples/ examples/instrumented-jetreco.jl test/data/events.eeH.hepmc3.zst -m 1000 --algorithm=Durham
Precompiling JetReconstruction...
1 dependency successfully precompiled in 2 seconds. 66 already precompiled.
Processed 100 events 1000 times
Average time per event 23.20044566999999 ± 1.9812033289848472 μs
Lowest time per event 20.8875 μs
So here is what I get, comparing your branch Matt (7155768) with current main (d3c97f39). This is running
julia --project instrumented-jetreco.jl --algorithm=Durham ../test/data/events.eeH.hepmc3.zst -m 1000
in each case and looking at the lowest time achieved (though the average shows the same pattern).
| Machine | vlc |
main |
Ratio |
|---|---|---|---|
| MacBook Pro x86_64 2.2GHz i7 | 40.4 | 38.3 | 0.95 |
| Alma 9 Ryzen 7 5700G 3.8GHz | 35.6 | 34.1 | 0.96 |
So I do maintain there is a 4-5% regression in the speed of the Durham algorithm with the current Valencia implementation.
Likewise for EEKt with R=1.0 p=1:
| Machine | vlc |
main |
Ratio |
|---|---|---|---|
| MacBook Pro x86_64 2.2GHz i7 | 44.3 | 40.6 | 0.92 |
| Alma 9 Ryzen 7 5700G 3.8GHz | 38.8 | 36.8 | 0.95 |
The slowdown on the old MacBook is a bit worse; however, the fairly modern Linux machine is a pretty typical production machine and it's suffering too.
OK, thanks for confirming these numbers. What's your preferred solution? I can try to factorise the VLC code away without duplicating too much code under the hood.
I'll benchmark future changes on an alma9 machine so that I'm more likely to catch any differences in performance.
I have some ideas I'd like to try tomorrow to try and better understand the problem and mitigations. Hopefully we can avoid a total separation of the algorithm logic!
I'll benchmark future changes on an alma9 machine so that I'm more likely to catch any differences in performance.
Down at these edges of squeezing out all the performance we can we do see different behaviour between Apple Silicon and Intel chips as @Moelf noted. See also, e.g., #151. So having a spread of architectures is certainly useful.
Just getting back to this again, I took some benchmarks again with Julia 1.11.7 on an AMD Ryzen 7 5700G and on a MacBook Pro M4:
| Machine | Main branch | Valencia branch | Ratio |
|---|---|---|---|
| Ryzen 5700G | 34.1 | 37.1 | 0.92 |
| M4 | 17.1 | 17.1 | 1.00 |
So the M4 doesn't show a regression (and is also blazingly fast 🚀 ).
I'd still like to understand/recover the regression on Linux x86.
Closed in favour of #217