Migrate LST outputs to Portable SoAs
This PR refactors the LST outputs so that Portable SoAs are used as much as possible. The output of the LST Producer is now a device collection, and the framework takes care of copying it to the host.
This continues the work from #47793, now on the outputs side, and completes the tasks related to CMSSW-LST interfacing in #46746.
c.c. @slava77
cms-bot internal usage
+code-checks
Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-48409/45307
- There are other open Pull requests which might conflict with changes you have proposed:
- File DataFormats/Common/src/classes_def.xml modified in PR(s): #47629
- File RecoTracker/LSTCore/src/alpaka/LST.cc modified in PR(s): #48377
- File RecoTracker/LSTCore/src/alpaka/LSTEvent.dev.cc modified in PR(s): #48377
A new Pull Request was created by @ariostas for master.
It involves the following packages:
- DataFormats/Common (core)
- RecoTracker/LST (reconstruction)
- RecoTracker/LSTCore (reconstruction)
@Dr15Jones, @cmsbuild, @jfernan2, @makortel, @mandrenguyen, @smuzaffar can you please review it and eventually sign? Thanks. @GiacomoSguazzoni, @VinInn, @VourMa, @dgulhan, @felicepantaleo, @gpetruc, @makortel, @missirol, @mmusich, @mtosi, @rovere, @wddgit this is something you requested to watch as well. @antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy you are the release manager for this.
cms-bot commands are listed here
test parameters:
- enable_tests = gpu
- workflows_gpu = 29634.704,29834.704
- workflows = 29634.703,29834.703,29834.755,29634.757,29834.757
- relvals_opt = -w upgrade,standard
- relvals_opt_gpu = -w upgrade,standard
@cmsbuild please test
-1
Failed Tests: Build HeaderConsistency
Size: This PR adds an extra 64KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-3c9102/46917/summary.html
COMMIT: 21a525ea2735f4c3601b512bd23de40d6885e1d8
CMSSW: CMSSW_15_1_X_2025-06-25-1100/el8_amd64_gcc12
Additional Tests: CUDA,ROCM
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/48409/46917/install.sh to create a dev area with all the needed externals and cmssw changes.
Build
I found compilation error when building:
Copying tmp/el8_amd64_gcc12/src/RecoTracker/LSTCore/src/alpaka/RecoTrackerLSTCoreROCmAsync/libRecoTrackerLSTCoreROCmAsync_rocm.a to productstore area: Copying tmp/el8_amd64_gcc12/src/RecoTracker/LSTCore/src/alpaka/RecoTrackerLSTCoreCudaAsync/libRecoTrackerLSTCoreCudaAsync_nv.a to productstore area: cp: cannot stat 'tmp/el8_amd64_gcc12/src/RecoTracker/LSTCore/src/alpaka/RecoTrackerLSTCoreROCmAsync/libRecoTrackerLSTCoreROCmAsync_rocm.a': No such file or directory cp: cannot stat 'tmp/el8_amd64_gcc12/src/RecoTracker/LSTCore/src/alpaka/RecoTrackerLSTCoreCudaAsync/libRecoTrackerLSTCoreCudaAsync_nv.a': No such file or directory >> Deleted: tmp/el8_amd64_gcc12/src/RecoTracker/LSTCore/src/alpaka/RecoTrackerLSTCoreCudaAsync/libRecoTrackerLSTCoreCudaAsync_nv.a gmake: *** [config/SCRAM/GMake/Makefile.rules:1920: tmp/el8_amd64_gcc12/src/RecoTracker/LSTCore/src/alpaka/RecoTrackerLSTCoreCudaAsync/libRecoTrackerLSTCoreCudaAsync_nv.a] Error 1 >> Deleted: tmp/el8_amd64_gcc12/src/RecoTracker/LSTCore/src/alpaka/RecoTrackerLSTCoreROCmAsync/libRecoTrackerLSTCoreROCmAsync_rocm.a gmake: *** [config/SCRAM/GMake/Makefile.rules:1920: tmp/el8_amd64_gcc12/src/RecoTracker/LSTCore/src/alpaka/RecoTrackerLSTCoreROCmAsync/libRecoTrackerLSTCoreROCmAsync_rocm.a] Error 1 @@@@ Checking for missing symbols was SKIPPED due to NO_LIB_CHECKING flag in BuildFile: libUtilitiesStaticAnalyzers.so Unknow target lib/el8_amd64_gcc12/RecoTrackerLSTCore_xr.rootmap Unknow target lib/el8_amd64_gcc12/RecoTrackerLST_xr.rootmap
Sorry about that, that header was deleted by mistake.
+code-checks
Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-48409/45311
- There are other open Pull requests which might conflict with changes you have proposed:
- File DataFormats/Common/src/classes_def.xml modified in PR(s): #47629
- File RecoTracker/LSTCore/src/alpaka/LST.cc modified in PR(s): #48377
- File RecoTracker/LSTCore/src/alpaka/LSTEvent.dev.cc modified in PR(s): #48377
Pull request #48409 was updated. @Dr15Jones, @cmsbuild, @jfernan2, @makortel, @mandrenguyen, @smuzaffar can you please check and sign again.
@cmsbuild please test
-1
Failed Tests: Build
Size: This PR adds an extra 64KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-3c9102/46920/summary.html
COMMIT: b8eabc3be53edf6136f07c2d71eeb02b40630b9c
CMSSW: CMSSW_15_1_X_2025-06-25-1100/el8_amd64_gcc12
Additional Tests: CUDA,ROCM
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/48409/46920/install.sh to create a dev area with all the needed externals and cmssw changes.
Build
I found compilation error when building:
>> Compiling src/RecoTracker/LST/src/ES_ModulesDev.cc /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/bin/c++ -c -DCMS_MICRO_ARCH='x86-64-v3' -DGNU_GCC -D_GNU_SOURCE -DTBB_USE_GLIBCXX_VERSION=120301 -DTBB_SUPPRESS_DEPRECATED_MESSAGES -DTBB_PREVIEW_RESUMABLE_TASKS=1 -DTBB_PREVIEW_TASK_GROUP_EXTENSIONS=1 -DBOOST_SPIRIT_THREADSAFE -DPHOENIX_THREADSAFE -DBOOST_MATH_DISABLE_STD_FPCLASSIFY -DBOOST_UUID_RANDOM_PROVIDER_FORCE_POSIX -DBOOST_MPL_IGNORE_PARENTHESES_WARNING -DCMSSW_GIT_HASH='CMSSW_15_1_X_2025-06-25-1100' -DPROJECT_NAME='CMSSW' -DPROJECT_VERSION='CMSSW_15_1_X_2025-06-25-1100' -Isrc -Ipoison -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_15_1_X_2025-06-25-1100/src -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/alpaka/1.2.0-23a2bf2e896b7aace8e772f289604b47/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/pcre/8.43-2d141998cfe5424b8f7aff48035cc2da/include -isystem/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/boost/1.80.0-189b192d618e9605b04b60048d1376aa/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/bz2lib/1.0.6-d065ccd79984efc6d4660f410e4c81de/include -isystem/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/clhep/2.4.7.1-d3a3e353d370e701238f7949a0d7909f/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/gsl/2.6-f7574c606b0ce57ff601d3ca9534cd01/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/libuuid/2.34-27ce4c3579b5b1de2808ea9c4cd8ed29/include -isystem/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/lcg/root/6.32.13-e61674dd33920ceb725b332c4d0bf91b/include -isystem/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/tbb/v2022.0.0-79b5a917b0c13f831cd534a5b9f53a95/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/xz/5.2.5-6f3f49b07db84e10c9be594a1176c114/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/zlib/1.2.13-d217cdbdd8d586e845e05946de2796be/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/eigen/3bb6a48d8c171cf20b5f8e48bfb4e424fbd4f79e-5d91c922e771c0dc4f6bc00f61f3e2c5/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/eigen/3bb6a48d8c171cf20b5f8e48bfb4e424fbd4f79e-5d91c922e771c0dc4f6bc00f61f3e2c5/include/eigen3 -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/fmt/10.2.1-e35fd1db5eb3abc8ac0452e8ee427196/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/md5/1.0.0-5b594b264e04ae51e893b1d69a797ec6/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/OpenBLAS/0.3.27-70a9dd2c9f309171934f13e3003b0540/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/tinyxml2/6.2.0-a0ad3950415fa3138d99b7da42eb4c9f/include -O3 -pthread -pipe -Werror=main -Werror=pointer-arith -Werror=overlength-strings -Wno-vla -Werror=overflow -std=c++20 -ftree-vectorize -Werror=array-bounds -Werror=format-contains-nul -Werror=type-limits -fvisibility-inlines-hidden -fno-math-errno --param vect-max-version-for-alias-checks=50 -Xassembler --compress-debug-sections -Wno-error=array-bounds -Warray-bounds -fuse-ld=bfd -march=x86-64-v3 -felide-constructors -fmessage-length=0 -Wall -Wno-non-template-friend -Wno-long-long -Wreturn-type -Wextra -Wpessimizing-move -Wclass-memaccess -Wno-cast-function-type -Wno-unused-but-set-parameter -Wno-ignored-qualifiers -Wno-unused-parameter -Wunused -Wparentheses -Werror=return-type -Werror=missing-braces -Werror=unused-value -Werror=unused-label -Werror=address -Werror=format -Werror=sign-compare -Werror=write-strings -Werror=delete-non-virtual-dtor -Werror=strict-aliasing -Werror=narrowing -Werror=unused-but-set-variable -Werror=reorder -Werror=unused-variable -Werror=conversion-null -Werror=return-local-addr -Wnon-virtual-dtor -Werror=switch -fdiagnostics-show-option -Wno-unused-local-typedefs -Wno-attributes -Wno-psabi -DEIGEN_DONT_PARALLELIZE -DEIGEN_MAX_ALIGN_BYTES=64 -Wno-error=unused-variable -DALPAKA_DEFAULT_HOST_MEMORY_ALIGNMENT=128 -DALPAKA_DISABLE_VENDOR_RNG -DBOOST_DISABLE_ASSERTS -flto=auto -fipa-icf -flto-odr-type-merging -fno-fat-lto-objects -Wodr -fPIC -MMD -MF tmp/el8_amd64_gcc12/src/RecoTracker/LST/src/RecoTrackerLST/ES_ModulesDev.cc.d src/RecoTracker/LST/src/ES_ModulesDev.cc -o tmp/el8_amd64_gcc12/src/RecoTracker/LST/src/RecoTrackerLST/ES_ModulesDev.cc.o >> Building LCG reflex dict from header file src/RecoTracker/LST/src/classes.h /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/lcg/root/6.32.13-e61674dd33920ceb725b332c4d0bf91b/bin/rootcling -reflex -f tmp/el8_amd64_gcc12/src/RecoTracker/LST/src/RecoTrackerLST/lcgdict/RecoTrackerLST_xr.cc -inlineInputHeader -failOnWarnings -rmf tmp/el8_amd64_gcc12/src/RecoTracker/LST/src/RecoTrackerLST/lcgdict/RecoTrackerLST_xr.rootmap -rml libRecoTrackerLST.so -m RecoTrackerLSTCore_xr_rdict.pcm -m DataFormatsCommon_xr_rdict.pcm -DALPAKA_DEFAULT_HOST_MEMORY_ALIGNMENT=128 -DALPAKA_DISABLE_VENDOR_RNG -DCMS_DICT_IMPL -D_REENTRANT -DGNUSOURCE -D__STRICT_ANSI__ -DCMS_MICRO_ARCH="x86-64-v3" -DGNU_GCC -D_GNU_SOURCE -DTBB_USE_GLIBCXX_VERSION=120301 -DTBB_SUPPRESS_DEPRECATED_MESSAGES -DTBB_PREVIEW_RESUMABLE_TASKS=1 -DTBB_PREVIEW_TASK_GROUP_EXTENSIONS=1 -DBOOST_SPIRIT_THREADSAFE -DPHOENIX_THREADSAFE -DBOOST_MATH_DISABLE_STD_FPCLASSIFY -DBOOST_UUID_RANDOM_PROVIDER_FORCE_POSIX -DBOOST_MPL_IGNORE_PARENTHESES_WARNING -DCMSSW_GIT_HASH="CMSSW_15_1_X_2025-06-25-1100" -DPROJECT_NAME="CMSSW" -DPROJECT_VERSION="CMSSW_15_1_X_2025-06-25-1100" -Isrc -Ipoison -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_15_1_X_2025-06-25-1100/src -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/alpaka/1.2.0-23a2bf2e896b7aace8e772f289604b47/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/pcre/8.43-2d141998cfe5424b8f7aff48035cc2da/include -isystem/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/boost/1.80.0-189b192d618e9605b04b60048d1376aa/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/bz2lib/1.0.6-d065ccd79984efc6d4660f410e4c81de/include -isystem/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/clhep/2.4.7.1-d3a3e353d370e701238f7949a0d7909f/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/gsl/2.6-f7574c606b0ce57ff601d3ca9534cd01/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/libuuid/2.34-27ce4c3579b5b1de2808ea9c4cd8ed29/include -isystem/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/lcg/root/6.32.13-e61674dd33920ceb725b332c4d0bf91b/include -isystem/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/tbb/v2022.0.0-79b5a917b0c13f831cd534a5b9f53a95/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/xz/5.2.5-6f3f49b07db84e10c9be594a1176c114/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/zlib/1.2.13-d217cdbdd8d586e845e05946de2796be/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/eigen/3bb6a48d8c171cf20b5f8e48bfb4e424fbd4f79e-5d91c922e771c0dc4f6bc00f61f3e2c5/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/eigen/3bb6a48d8c171cf20b5f8e48bfb4e424fbd4f79e-5d91c922e771c0dc4f6bc00f61f3e2c5/include/eigen3 -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/fmt/10.2.1-e35fd1db5eb3abc8ac0452e8ee427196/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/md5/1.0.0-5b594b264e04ae51e893b1d69a797ec6/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/OpenBLAS/0.3.27-70a9dd2c9f309171934f13e3003b0540/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/tinyxml2/6.2.0-a0ad3950415fa3138d99b7da42eb4c9f/include -DCMSSW_REFLEX_DICT src/RecoTracker/LST/src/classes.h src/RecoTracker/LST/src/classes_def.xml In file included from input_line_7:73: poison/RecoTracker/LST/interface/LSTOutput.h:1:2: error: THIS FILE HAS BEEN REMOVED FROM THE PACKAGE. #error THIS FILE HAS BEEN REMOVED FROM THE PACKAGE. ^ Error: /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/lcg/root/6.32.13-e61674dd33920ceb725b332c4d0bf91b/bin/rootcling: compilation failure (tmp/el8_amd64_gcc12/src/RecoTracker/LST/src/RecoTrackerLST/lcgdict/RecoTrackerLST_xr1fb1511b33_dictUmbrella.h) gmake: *** [tmp/el8_amd64_gcc12/src/RecoTracker/LST/src/RecoTrackerLST/lcgdict/RecoTrackerLST_xr.cc] Error 1 >> Compiling LCG dictionary: tmp/el8_amd64_gcc12/src/RecoTracker/LST/src/RecoTrackerLST/lcgdict/RecoTrackerLST_xr.cc
I'm curious why the file removal is not visible during local compilation. Is there some extra flag in scram to not use the removed file from the release? We should've noticed this in the LST CI.
+code-checks
Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-48409/45312
- There are other open Pull requests which might conflict with changes you have proposed:
- File DataFormats/Common/src/classes_def.xml modified in PR(s): #47629
- File RecoTracker/LSTCore/src/alpaka/LST.cc modified in PR(s): #48377
- File RecoTracker/LSTCore/src/alpaka/LSTEvent.dev.cc modified in PR(s): #48377
Pull request #48409 was updated. @Dr15Jones, @cmsbuild, @jfernan2, @makortel, @mandrenguyen, @smuzaffar can you please check and sign again.
I'm curious why the file removal is not visible during local compilation. Is there some extra flag in scram to not use the removed file from the release? We should've noticed this in the LST CI.
Yeah, we definitely should have noticed this in our CI. I'll look to see if there are extra flags to make it closer to the "real" CI.
@cmsbuild please test
I'm curious why the file removal is not visible during local compilation. Is there some extra flag in scram to not use the removed file from the release? We should've noticed this in the LST CI.
Yeah, we definitely should have noticed this in our CI. I'll look to see if there are extra flags to make it closer to the "real" CI.
@smuzaffar @iarspider please clarify if there is something special added in the bot tests to poison the removed files
@slava77 @ariostas , bot just run git cms-checkdeps -a -A after checkout the changes this poisons the deleted files. It is a good practice to run git git-cms-checkdeps -a -A to checkout all the packages which might need rebuilting due to local changes and this also will create the poison files
Thank you, @smuzaffar! I didn't know that that also poison deleted files
-1
Failed Tests: RelVals RelVals-ROCM
Size: This PR adds an extra 64KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-3c9102/46921/summary.html
COMMIT: 31df503ac378ccb4c147f6450850e5efcca990ab
CMSSW: CMSSW_15_1_X_2025-06-25-1100/el8_amd64_gcc12
Additional Tests: CUDA,ROCM
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/48409/46921/install.sh to create a dev area with all the needed externals and cmssw changes.
RelVals
----- Begin Fatal Exception 25-Jun-2025 21:12:41 CEST-----------------------
An exception of category 'PluginLibraryLoadError' occurred while
[0] Constructing the EventProcessor
[1] While attempting to load plugin LSTOutputConverter
Exception Message:
unable to load /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/48409/46921/CMSSW_15_1_X_2025-06-25-1100/lib/el8_amd64_gcc12/poisoned/plugin-poisoned-RecoTrackerLSTPlugins.so because /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/48409/46921/CMSSW_15_1_X_2025-06-25-1100/lib/el8_amd64_gcc12/poisoned/plugin-poisoned-RecoTrackerLSTPlugins.so: cannot open shared object file: No such file or directory
----- End Fatal Exception -------------------------------------------------
----- Begin Fatal Exception 25-Jun-2025 21:12:41 CEST-----------------------
An exception of category 'PluginLibraryLoadError' occurred while
[0] Constructing the EventProcessor
[1] While attempting to load plugin LSTOutputConverter
Exception Message:
unable to load /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/48409/46921/CMSSW_15_1_X_2025-06-25-1100/lib/el8_amd64_gcc12/poisoned/plugin-poisoned-RecoTrackerLSTPlugins.so because /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/48409/46921/CMSSW_15_1_X_2025-06-25-1100/lib/el8_amd64_gcc12/poisoned/plugin-poisoned-RecoTrackerLSTPlugins.so: cannot open shared object file: No such file or directory
----- End Fatal Exception -------------------------------------------------
----- Begin Fatal Exception 25-Jun-2025 21:12:41 CEST-----------------------
An exception of category 'PluginLibraryLoadError' occurred while
[0] Constructing the EventProcessor
[1] While attempting to load plugin LSTOutputConverter
Exception Message:
unable to load /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/48409/46921/CMSSW_15_1_X_2025-06-25-1100/lib/el8_amd64_gcc12/poisoned/plugin-poisoned-RecoTrackerLSTPlugins.so because /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/48409/46921/CMSSW_15_1_X_2025-06-25-1100/lib/el8_amd64_gcc12/poisoned/plugin-poisoned-RecoTrackerLSTPlugins.so: cannot open shared object file: No such file or directory
----- End Fatal Exception -------------------------------------------------
RelVals-ROCM
- 12834.406
12834.406_TTbar_14TeV+2024_Patatrack_PixelOnlyTripletsAlpaka/step3_TTbar_14TeV+2024_Patatrack_PixelOnlyTripletsAlpaka.log
CUDA Comparison Summary
Summary:
- You potentially added 1 lines to the logs
- Reco comparison results: 184 differences found in the comparisons
- DQMHistoTests: Total files compared: 9
- DQMHistoTests: Total histograms compared: 117626
- DQMHistoTests: Total failures: 1194
- DQMHistoTests: Total nulls: 0
- DQMHistoTests: Total successes: 116432
- DQMHistoTests: Total skipped: 0
- DQMHistoTests: Total Missing objects: 0
- DQMHistoSizes: Histogram memory added: 0.0 KiB( 8 files compared)
- Checked 32 log files, 36 edm output root files, 9 DQM output files
- TriggerResults: no differences found
The ROCm failure looks unrelated.
For the other failures, could they be false positives? The LSTOutputConverter plugin was moved from plugins to plugins/alpaka, but kept the same name. So could it be finding the poisoned one first? Otherwise, I'm not sure what would need to be updated.
assign heterogeneous
New categories assigned: heterogeneous
@fwyzard,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks
(just to set expectations: I am about to leave until mid-July, and I already have a backlog of PRs to review, so I can only get to it probably by the end of July)
(just to set expectations: I am about to leave until mid-July, and I already have a backlog of PRs to review, so I can only get to it probably by the end of July)
Thank you for clarifying your plans.
@makortel is also in the heterogeneous; is it possible to have Matti to review?
@slava77 Matti is on vacation and will not be back till the 2nd week of July.
@slava77 Matti is on vacation and will not be back till the 2nd week of July.
OK, summer time.
In the meantime (while we wait for the main reviewer(s)) it would be nice to get advice/clarification on the plugin issue https://github.com/cms-sw/cmssw/pull/48409#issuecomment-3006066567
For the other failures, could they be false positives? The LSTOutputConverter plugin was moved from plugins to plugins/alpaka, but kept the same name. So could it be finding the poisoned one first? Otherwise, I'm not sure what would need to be updated.
The plugin system used the LD_LIBRARY_PATH environment variable to decide the order in which to look for plugins (this follows how the OS looks for shared libraries). The local work environment's lib directory should be in LD_LIBRARY_PATH before the local work environment's poison directory therefore the new plugin should be found first.
Are you certain the new plugin is actually being built?
@smuzaffar any thoughts?
So I see that the following libraries are built
- libRecoTrackerLSTPluginsPortableCudaAsync.so
- libRecoTrackerLSTPluginsPortableROCmAsync.so
- libRecoTrackerLSTPluginsPortableSerialSync.so
neither of these are an exact match for plugin-poisoned-RecoTrackerLSTPlugins.so as previously, I believe, the LSTOutputConverter was just in the plugins directory which would match the _poisoned` name.
In the conversion to an alpaka module, the module's type name would have been changed as well. It would no longer just be LSTOutputConverter. So in the configuration for these jobs, how is the type of the module specified?
So you need to change https://github.com/cms-sw/cmssw/blob/3c9125e2eab1a559037a9bc56b190ad25a194c4d/HLTrigger/Configuration/python/HLT_75e33/modules/hltInitialStepTrackCandidates_cfi.py#L27
to
_hltInitialStepTrackCandidatesLST = cms.EDProducer('LSTOutputConverter@alpaka',
and do that for all configurations.