celeritas icon indicating copy to clipboard operation
celeritas copied to clipboard

ATLAS integration

Open sethrj opened this issue 1 year ago • 13 comments

This is the primary tracking issue for integrating Celeritas with the simulation workflow for the ATLAS experiment. The primary Celeritas contact points are the "assignees" to the right.

Athena integration

  • [x] Demonstrate celeritas a part of FullSimLight (https://github.com/celeritas-project/GeoModel)
  • [x] Prototype celeritas external
  • [x] Prototype CPU integration
  • [x] Protype GPU integration blocked by:
    • [x] https://github.com/celeritas-project/celeritas/issues/1462
    • [x] https://github.com/celeritas-project/celeritas/issues/1494
  • [ ] Merge celeritas external https://gitlab.cern.ch/atlas/atlasexternals/-/merge_requests/1001
  • [ ] Merge CPU integration
  • [ ] Merge GPU integration

ATLAS JIRA issue is accessible to ATLAS collaboration members

Core capabilities

  • [ ] R,Z,θ magnetic field and reader: see MagField::AtlasFieldSvc::readMap inside atlas extensions
  • [x] Use nav history to reconstruct G4 steps for callback due to bajillion EMEC boundaries: https://github.com/celeritas-project/celeritas/issues/1248
  • [ ] Hit sequence for muon detectors: https://github.com/celeritas-project/celeritas/issues/1503
  • [ ] Reconstruction properties for SDs #1505
  • [ ] Region support #983

Extra capabilities

  • [ ] Woodcock tracking improved performance 17.5% overall in ATLAS due to highly segmented detectors
  • [ ] Scoring observables #1594
  • [ ] Region- and particle-dependent energy loss enable/disable (UR-105)

sethrj avatar Mar 08 '24 16:03 sethrj

To briefly follow up here, the internal ATLAS ticket on this isn't visible, but as of today there is a basic CPU-only build of atlasexternals here:

https://gitlab.cern.ch/bmorgan/atlasexternals/-/tree/integrate-celeritas?ref_type=heads

and the corresponding Athena integration (compile and CPU only) here:

https://gitlab.cern.ch/bmorgan/athena/-/commits/integrate-celeritas/?ref_type=heads

So, a way to go, but there is a starting point.

drbenmorgan avatar Jul 17 '24 12:07 drbenmorgan

Blocked by #1462 : vecgeom issues

sethrj avatar Nov 05 '24 22:11 sethrj

@sethrj Since ATLAS uses GCC 13.1/C++20, should we add CI jobs for this configuration?

esseivaju avatar Nov 12 '24 01:11 esseivaju

Good idea. Static build? Try linking against VecGeom with cuda? 😅

sethrj avatar Nov 12 '24 02:11 sethrj

A static build would be good to have. Atlas is in the process of switching to GCC 14 so we could directly jump to that compiler. It would require updating the CI images to Ubuntu 24.04.

esseivaju avatar Nov 13 '24 02:11 esseivaju

I think the main blocker to static builds was disk space... 🤔

sethrj avatar Nov 13 '24 12:11 sethrj

@esseivaju @drbenmorgan Are there any stupid-simple ATLAS validation problems that use geantinos (or charged geantinos) to check the behavior of stepping and doing feedback? I don't think it would be a huge lift to add conversion and offloading of celeritons/celerinos 😉 and for testing purposes adding a callback that would directly interact with the stepping user action if that can be used to produce validation plots.

sethrj avatar Dec 01 '24 23:12 sethrj

Instructions for building and running AthSimulation+Celeritas GPU

Following a message on Slack, here are notes on my setup to build and run AthSimulation + Celeritas GPU. This is only tested by myself on Perlmutter, using a container to provide cuda, I might have missed something, let me know if something doesn't work.

Requirements

  • CVMFS
  • Celeritas develop (e.g., 6338b6c74562cb306454a3d81f846184fbe36541)
  • VecGeom 1.2.10
  • an alma9 system with Cuda (12.4+) and GPUs (e.g., lxplus). I'm using a container on Perlmutter.

If using the container above, source this after starting it:

#!/bin/bash
export PATH=/usr/local/cuda/bin:/opt/local/bin:$PATH
export LD_LIBRARY_PATH=/opt/local/lib:/opt/local/lib64:$LD_LIBRARY_PATH
export LD_RUN_PATH=/opt/local/lib:/opt/local/lib64:$LD_RUN_PATH
export LIBRARY_PATH=/opt/local/lib:/opt/local/lib64:$LIBRARY_PATH
export CPATH=/opt/local/include:$CPATH
export CMAKE_PREFIX_PATH=/opt/local:$CMAKE_PREFIX_PATH
export PKG_CONFIG_PATH=/opt/local/lib/pkgconfig:$PKG_CONFIG_PATH
eval "$(starship init bash)"
export ATLAS_LOCAL_ROOT_BASE="/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase"
function setupATLAS {
  source ${ATLAS_LOCAL_ROOT_BASE}/user/atlasLocalSetup.sh
}
setupATLAS

Building AtlasExternals

Since we haven't integrated all the changes into the repo yet, start from the branch atlassim-6635-cuda and update the Celeritas and VecGeom versions as specified above. This is my build script on Perlmutter (using the docker container linked above):

# to configure asetup, see https://gitlab.cern.ch/bmorgan/atlassim-6635#option-1-development-of-athenaathsimulation-only
asetup none,gcc13,cmakesetup
lsetup git

export PATH=/cvmfs/sft.cern.ch/lcg/contrib/ninja/1.11.1/Linux-x86_64/bin:$PATH
export CUDACXX=/usr/local/cuda/bin/nvcc
BASE_DIR=`pwd`
BUILD_DIR=$BASE_DIR/externals_build
# set to your cuda architecture
cmake -GNinja -DCMAKE_CUDA_ARCHITECTURES=80 -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCTEST_USE_LAUNCHERS=TRUE -S $BASE_DIR/atlasexternals/Projects/AthSimulationExternals -B $BUILD_DIR
cmake --build $BUILD_DIR
DESTDIR=$BASE_DIR/install cmake --install $BUILD_DIR

Building AthSimulation

⚠️ If building outside lxplus follow these instructions to configure G4PATH before building AthSimulation

Use the branch atlassim-6635-cuda. This only supports Athena in single-thread mode. To run with AthenaMT, include this PR in your build. This works with both multi-thread and single-thread. This is the build script I use to build AthSimulation:

asetup none,gcc13,cmakesetup
lsetup git

BASE_DIR=`pwd`
source $BASE_DIR/install/AthSimulationExternals/22.0.0/InstallArea/x86_64-el9-gcc13-opt/setup.sh
export PATH=/cvmfs/sft.cern.ch/lcg/contrib/ninja/1.11.1/Linux-x86_64/bin:$PATH
export CUDACXX=/usr/local/cuda/bin/nvcc
export AthSimulationExternals_DIR=$BASE_DIR/install/AthSimulationExternals/22.0.0/InstallArea/x86_64-el9-gcc13-opt
export CMAKE_PREFIX_PATH=$BASE_DIR/install/AthSimulationExternals/22.0.0/InstallArea/x86_64-el9-gcc13-opt:$CMAKE_PREFIX_PATH
# set to your G4DATA path if not building on lxplus
export G4PATH=/pscratch/sd/e/esseivaj/celer-athena/g4data/releases

BUILD_DIR=$BASE_DIR/athsim_build
cmake -GNinja -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_EXPORT_COMPILE_COMMANDS=TRUE -S $BASE_DIR/athena/Projects/AthSimulation/ -B $BUILD_DIR
cmake --build $BUILD_DIR

Running a transform

Once you have built the full stack, you can run AthSimulation+Celeritas. To setup the environment in a new shell:

# Again, set according to your environment
export G4PATH=/pscratch/sd/e/esseivaj/celer-athena/g4data/releases
asetup none,gcc13,cmakesetup
BASE_DIR=`pwd`
source $BASE_DIR/install/AthSimulationExternals/22.0.0/InstallArea/x86_64-el9-gcc13-opt/setup.sh
source $BASE_DIR/athsim_build/x86_64-el9-gcc13-opt/setup.sh

# This should only be needed outside lxplus, the default points to an $AFS location
export DATAPATH=/cvmfs/atlas.cern.ch/repo/sw/software/25.0/atlas/offline/ReleaseData/v20:$DATAPATH
export ATLASCALDATA=/cvmfs/atlas.cern.ch/repo/sw/software/25.0/atlas/offline/ReleaseData/v20

export PATH=/cvmfs/sft.cern.ch/lcg/contrib/ninja/1.11.1/Linux-x86_64/bin:$PATH
export CUDACXX=/usr/local/cuda/bin/nvcc
export CMAKE_PREFIX_PATH=/cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/sw/lcg/releases/LCG_106_ATLAS_13/XercesC/3.2.4/x86_64-el9-gcc13-opt:$CMAKE_PREFIX_PATH

This is an example transform running AthenaMT+Celeritas GPU. For single-threaded, remove export ATHENA_PROC_NUMBER, export ATHENA_CORE_NUMBER and --multithreaded True. If you didn't build AthSim with the change in the linked PR, remove the flags.Sim.GPU from the --preExec argument

export ATHENA_PROC_NUMBER=16
export ATHENA_CORE_NUMBER=$ATHENA_PROC_NUMBER
INPUTFILE="/cvmfs/atlas-nightlies.cern.ch/repo/data/data-art/CampaignInputs/mc23/EVNT/mc23_13p6TeV.601229.PhPy8EG_A14_ttbar_hdamp258p75_SingleLep.evgen.EVNT.e8514/EVNT.32288062._002040.pool.root.1"
AtlasG4_tf.py \
  --CA True \
  --multithreaded True \
  --perfmon none \
  --detectors 'Calo' \
  --conditionsTag 'OFLCOND-MC23-SDR-RUN3-01' \
  --postInclude 'PyJobTransforms.TransformUtils.UseFrontier' \
  --preInclude 'AtlasG4Tf:Campaigns.MC23SimulationSingleIoV,SimulationConfig.disablePhotonRussianRoulette,SimulationConfig.disableNeutronRussianRoulette,SimulationConfig.disableFrozenShowersFCalOnly' \
  --geometryVersion 'ATLAS-R3S-2021-03-02-00' \
  --inputEVNTFile "$INPUTFILE" \
  --outputHITSFile "mc23_13p6TeV.601229.PhPy8EG_A14_ttbar_hdamp258p75_SingleLep_Celer_gpu_Calo.HITS.pool.root" \
  --maxEvents '1000' \
  --skipEvents '0' \
  --randomSeed '10' \
  --preExec "flags.Sim.OptionalUserActionList +=[\"G4UserActions.G4UserActionsConfig.GPUOffloadToolCfg\"];flags.Exec.FPE=-2;flags.GeoModel.EMECStandard=True;from SimulationConfig.SimEnums import CalibrationRun;flags.Sim.CalibrationRun=CalibrationRun.Off;flags.Sim.GPU.StackSize=8192;flags.Sim.GPU.HeapSize=512*1024*1024" \
  --postExec 'cfg.getService("StandardField").UseSoleCurrent=0.;cfg.getService("StandardField").UseToroCurrent=0.' \
  --imf False

esseivaju avatar Dec 16 '24 23:12 esseivaju

From @drbenmorgan today:

Very quickly, there is a new CVMFS install of the current Athena main branch, plus Celeritas, AdePT and VecGeom at the current tips of there develop/main branches. This can be setup via

$  asetup AthSimulation,local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt,2024-01-22T1700

This has both CPU/GPU Celeritas in, so it should be possible to run in CPU only mode by exporting CELER_DISABLE_DEVICE=1 before running any Athena transform. It doesn't seem to want to run on lxplus-gpu in GPU mode though. Will try and diagnose that further, and on my local GPU machine, but there for people to try out if you want. As far as I know all of the scripts etc from the hackathon and that people have been running should work.

sethrj avatar Jan 22 '25 18:01 sethrj

Update from Seth+Julien hackathon

  • Julien successfully reproduced matching results between Athena with/without Celeritas with Tilecal enabled and a 50GeV pion test beam. Yay!
  • Results with the LAr calorimeters (EMEC, barrel) do not match at all
  • To eliminate physics from the comparison I thought we could do a test beam of electrons and reduce the material density to a tiny fraction so that we're tracking almost through void and immediately sending the tracks to Celeritas; @tsulaiav helpfully gave us the code to set up the particle gun and reduce the material density

LAr detectors

Here's the full raytrace with pseudorapidity lines from 0.5 to 3: raytrace

And the sensitive regions via debut output: detectors

And the output @esseivaju produced clearly shows we're moving electrons to Celeritas and sending lots of hits back. So I think there's something not meshing in the LAr SD code.

Yep, the last one checks step->GetStepLength which we don't set!

sethrj avatar Jan 25 '25 17:01 sethrj

Note about EndOfRunAction not being called on worker threads from John Chapman via @drbenmorgan:

It is a feature of tbb that during an Athena job threads get created/destroyed seemingly at random during the event loop. This is why: https://gitlab.cern.ch/bmorgan/athena/-/blob/atlassim-6635-cuda/Simulation/G4Atlas/G4AtlasTools/src/G4ThreadInitTool.cxx is necessary. On the Athena side "end of the event loop" implies the finalize() method: https://gitlab.cern.ch/bmorgan/athena/-/blob/atlassim-6635-cuda/Simulation/G4Atlas/G4AtlasAlg/src/G4AtlasAlg.cxx#L288-315 Currently G4AtlasAlg is cloned once per thread, this means that G4AtlasAlg::finalize will be called once per thread also. It wasn't a specific design choice not to call EndOfRunAction for all threads, but you can see that we are very careful only to call runMgr->RunTermination(); once. I can't remember the reason for this, but I suspect in the past calling it multiple times caused a crash. We can add something to call the EndOfRunActions per thread, but it might need a bit of unpicking to avoid whatever issue made us avoid calling RunTermination() multiple times. If not before, this will be something that we can address in Michael Duehrssen-Debling's rewrite of the Athena-Geant4 interface.

sethrj avatar Jan 29 '25 12:01 sethrj