cmssw icon indicating copy to clipboard operation
cmssw copied to clipboard

Migrate LST outputs to Portable SoAs

Open ariostas opened this issue 6 months ago • 37 comments

This PR refactors the LST outputs so that Portable SoAs are used as much as possible. The output of the LST Producer is now a device collection, and the framework takes care of copying it to the host.

This continues the work from #47793, now on the outputs side, and completes the tasks related to CMSSW-LST interfacing in #46746.

c.c. @slava77

ariostas avatar Jun 25 '25 15:06 ariostas

cms-bot internal usage

cmsbuild avatar Jun 25 '25 15:06 cmsbuild

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-48409/45307

  • There are other open Pull requests which might conflict with changes you have proposed:
    • File DataFormats/Common/src/classes_def.xml modified in PR(s): #47629
    • File RecoTracker/LSTCore/src/alpaka/LST.cc modified in PR(s): #48377
    • File RecoTracker/LSTCore/src/alpaka/LSTEvent.dev.cc modified in PR(s): #48377

cmsbuild avatar Jun 25 '25 15:06 cmsbuild

A new Pull Request was created by @ariostas for master.

It involves the following packages:

  • DataFormats/Common (core)
  • RecoTracker/LST (reconstruction)
  • RecoTracker/LSTCore (reconstruction)

@Dr15Jones, @cmsbuild, @jfernan2, @makortel, @mandrenguyen, @smuzaffar can you please review it and eventually sign? Thanks. @GiacomoSguazzoni, @VinInn, @VourMa, @dgulhan, @felicepantaleo, @gpetruc, @makortel, @missirol, @mmusich, @mtosi, @rovere, @wddgit this is something you requested to watch as well. @antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

cmsbuild avatar Jun 25 '25 15:06 cmsbuild

test parameters:

  • enable_tests = gpu
  • workflows_gpu = 29634.704,29834.704
  • workflows = 29634.703,29834.703,29834.755,29634.757,29834.757
  • relvals_opt = -w upgrade,standard
  • relvals_opt_gpu = -w upgrade,standard

slava77 avatar Jun 25 '25 16:06 slava77

@cmsbuild please test

slava77 avatar Jun 25 '25 16:06 slava77

-1

Failed Tests: Build HeaderConsistency Size: This PR adds an extra 64KB to repository Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-3c9102/46917/summary.html COMMIT: 21a525ea2735f4c3601b512bd23de40d6885e1d8 CMSSW: CMSSW_15_1_X_2025-06-25-1100/el8_amd64_gcc12 Additional Tests: CUDA,ROCM User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/48409/46917/install.sh to create a dev area with all the needed externals and cmssw changes.

Build

I found compilation error when building:

Copying tmp/el8_amd64_gcc12/src/RecoTracker/LSTCore/src/alpaka/RecoTrackerLSTCoreROCmAsync/libRecoTrackerLSTCoreROCmAsync_rocm.a to productstore area:
Copying tmp/el8_amd64_gcc12/src/RecoTracker/LSTCore/src/alpaka/RecoTrackerLSTCoreCudaAsync/libRecoTrackerLSTCoreCudaAsync_nv.a to productstore area:
cp: cannot stat 'tmp/el8_amd64_gcc12/src/RecoTracker/LSTCore/src/alpaka/RecoTrackerLSTCoreROCmAsync/libRecoTrackerLSTCoreROCmAsync_rocm.a': No such file or directory
cp: cannot stat 'tmp/el8_amd64_gcc12/src/RecoTracker/LSTCore/src/alpaka/RecoTrackerLSTCoreCudaAsync/libRecoTrackerLSTCoreCudaAsync_nv.a': No such file or directory
>> Deleted: tmp/el8_amd64_gcc12/src/RecoTracker/LSTCore/src/alpaka/RecoTrackerLSTCoreCudaAsync/libRecoTrackerLSTCoreCudaAsync_nv.a
gmake: *** [config/SCRAM/GMake/Makefile.rules:1920: tmp/el8_amd64_gcc12/src/RecoTracker/LSTCore/src/alpaka/RecoTrackerLSTCoreCudaAsync/libRecoTrackerLSTCoreCudaAsync_nv.a] Error 1
>> Deleted: tmp/el8_amd64_gcc12/src/RecoTracker/LSTCore/src/alpaka/RecoTrackerLSTCoreROCmAsync/libRecoTrackerLSTCoreROCmAsync_rocm.a
gmake: *** [config/SCRAM/GMake/Makefile.rules:1920: tmp/el8_amd64_gcc12/src/RecoTracker/LSTCore/src/alpaka/RecoTrackerLSTCoreROCmAsync/libRecoTrackerLSTCoreROCmAsync_rocm.a] Error 1
@@@@ Checking for missing symbols was SKIPPED due to NO_LIB_CHECKING flag in BuildFile: libUtilitiesStaticAnalyzers.so
Unknow target lib/el8_amd64_gcc12/RecoTrackerLSTCore_xr.rootmap
Unknow target lib/el8_amd64_gcc12/RecoTrackerLST_xr.rootmap

cmsbuild avatar Jun 25 '25 16:06 cmsbuild

Sorry about that, that header was deleted by mistake.

ariostas avatar Jun 25 '25 17:06 ariostas

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-48409/45311

  • There are other open Pull requests which might conflict with changes you have proposed:
    • File DataFormats/Common/src/classes_def.xml modified in PR(s): #47629
    • File RecoTracker/LSTCore/src/alpaka/LST.cc modified in PR(s): #48377
    • File RecoTracker/LSTCore/src/alpaka/LSTEvent.dev.cc modified in PR(s): #48377

cmsbuild avatar Jun 25 '25 17:06 cmsbuild

Pull request #48409 was updated. @Dr15Jones, @cmsbuild, @jfernan2, @makortel, @mandrenguyen, @smuzaffar can you please check and sign again.

cmsbuild avatar Jun 25 '25 17:06 cmsbuild

@cmsbuild please test

slava77 avatar Jun 25 '25 17:06 slava77

-1

Failed Tests: Build Size: This PR adds an extra 64KB to repository Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-3c9102/46920/summary.html COMMIT: b8eabc3be53edf6136f07c2d71eeb02b40630b9c CMSSW: CMSSW_15_1_X_2025-06-25-1100/el8_amd64_gcc12 Additional Tests: CUDA,ROCM User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/48409/46920/install.sh to create a dev area with all the needed externals and cmssw changes.

Build

I found compilation error when building:

>> Compiling  src/RecoTracker/LST/src/ES_ModulesDev.cc
/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/bin/c++ -c -DCMS_MICRO_ARCH='x86-64-v3' -DGNU_GCC -D_GNU_SOURCE -DTBB_USE_GLIBCXX_VERSION=120301 -DTBB_SUPPRESS_DEPRECATED_MESSAGES -DTBB_PREVIEW_RESUMABLE_TASKS=1 -DTBB_PREVIEW_TASK_GROUP_EXTENSIONS=1 -DBOOST_SPIRIT_THREADSAFE -DPHOENIX_THREADSAFE -DBOOST_MATH_DISABLE_STD_FPCLASSIFY -DBOOST_UUID_RANDOM_PROVIDER_FORCE_POSIX -DBOOST_MPL_IGNORE_PARENTHESES_WARNING -DCMSSW_GIT_HASH='CMSSW_15_1_X_2025-06-25-1100' -DPROJECT_NAME='CMSSW' -DPROJECT_VERSION='CMSSW_15_1_X_2025-06-25-1100' -Isrc -Ipoison -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_15_1_X_2025-06-25-1100/src -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/alpaka/1.2.0-23a2bf2e896b7aace8e772f289604b47/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/pcre/8.43-2d141998cfe5424b8f7aff48035cc2da/include -isystem/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/boost/1.80.0-189b192d618e9605b04b60048d1376aa/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/bz2lib/1.0.6-d065ccd79984efc6d4660f410e4c81de/include -isystem/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/clhep/2.4.7.1-d3a3e353d370e701238f7949a0d7909f/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/gsl/2.6-f7574c606b0ce57ff601d3ca9534cd01/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/libuuid/2.34-27ce4c3579b5b1de2808ea9c4cd8ed29/include -isystem/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/lcg/root/6.32.13-e61674dd33920ceb725b332c4d0bf91b/include -isystem/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/tbb/v2022.0.0-79b5a917b0c13f831cd534a5b9f53a95/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/xz/5.2.5-6f3f49b07db84e10c9be594a1176c114/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/zlib/1.2.13-d217cdbdd8d586e845e05946de2796be/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/eigen/3bb6a48d8c171cf20b5f8e48bfb4e424fbd4f79e-5d91c922e771c0dc4f6bc00f61f3e2c5/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/eigen/3bb6a48d8c171cf20b5f8e48bfb4e424fbd4f79e-5d91c922e771c0dc4f6bc00f61f3e2c5/include/eigen3 -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/fmt/10.2.1-e35fd1db5eb3abc8ac0452e8ee427196/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/md5/1.0.0-5b594b264e04ae51e893b1d69a797ec6/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/OpenBLAS/0.3.27-70a9dd2c9f309171934f13e3003b0540/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/tinyxml2/6.2.0-a0ad3950415fa3138d99b7da42eb4c9f/include -O3 -pthread -pipe -Werror=main -Werror=pointer-arith -Werror=overlength-strings -Wno-vla -Werror=overflow -std=c++20 -ftree-vectorize -Werror=array-bounds -Werror=format-contains-nul -Werror=type-limits -fvisibility-inlines-hidden -fno-math-errno --param vect-max-version-for-alias-checks=50 -Xassembler --compress-debug-sections -Wno-error=array-bounds -Warray-bounds -fuse-ld=bfd -march=x86-64-v3 -felide-constructors -fmessage-length=0 -Wall -Wno-non-template-friend -Wno-long-long -Wreturn-type -Wextra -Wpessimizing-move -Wclass-memaccess -Wno-cast-function-type -Wno-unused-but-set-parameter -Wno-ignored-qualifiers -Wno-unused-parameter -Wunused -Wparentheses -Werror=return-type -Werror=missing-braces -Werror=unused-value -Werror=unused-label -Werror=address -Werror=format -Werror=sign-compare -Werror=write-strings -Werror=delete-non-virtual-dtor -Werror=strict-aliasing -Werror=narrowing -Werror=unused-but-set-variable -Werror=reorder -Werror=unused-variable -Werror=conversion-null -Werror=return-local-addr -Wnon-virtual-dtor -Werror=switch -fdiagnostics-show-option -Wno-unused-local-typedefs -Wno-attributes -Wno-psabi -DEIGEN_DONT_PARALLELIZE -DEIGEN_MAX_ALIGN_BYTES=64 -Wno-error=unused-variable -DALPAKA_DEFAULT_HOST_MEMORY_ALIGNMENT=128 -DALPAKA_DISABLE_VENDOR_RNG -DBOOST_DISABLE_ASSERTS -flto=auto -fipa-icf -flto-odr-type-merging -fno-fat-lto-objects -Wodr -fPIC -MMD -MF tmp/el8_amd64_gcc12/src/RecoTracker/LST/src/RecoTrackerLST/ES_ModulesDev.cc.d src/RecoTracker/LST/src/ES_ModulesDev.cc -o tmp/el8_amd64_gcc12/src/RecoTracker/LST/src/RecoTrackerLST/ES_ModulesDev.cc.o
>> Building LCG reflex dict from header file src/RecoTracker/LST/src/classes.h
/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/lcg/root/6.32.13-e61674dd33920ceb725b332c4d0bf91b/bin/rootcling -reflex -f tmp/el8_amd64_gcc12/src/RecoTracker/LST/src/RecoTrackerLST/lcgdict/RecoTrackerLST_xr.cc -inlineInputHeader -failOnWarnings -rmf tmp/el8_amd64_gcc12/src/RecoTracker/LST/src/RecoTrackerLST/lcgdict/RecoTrackerLST_xr.rootmap -rml libRecoTrackerLST.so -m RecoTrackerLSTCore_xr_rdict.pcm -m DataFormatsCommon_xr_rdict.pcm -DALPAKA_DEFAULT_HOST_MEMORY_ALIGNMENT=128 -DALPAKA_DISABLE_VENDOR_RNG -DCMS_DICT_IMPL -D_REENTRANT -DGNUSOURCE -D__STRICT_ANSI__ -DCMS_MICRO_ARCH="x86-64-v3" -DGNU_GCC -D_GNU_SOURCE -DTBB_USE_GLIBCXX_VERSION=120301 -DTBB_SUPPRESS_DEPRECATED_MESSAGES -DTBB_PREVIEW_RESUMABLE_TASKS=1 -DTBB_PREVIEW_TASK_GROUP_EXTENSIONS=1 -DBOOST_SPIRIT_THREADSAFE -DPHOENIX_THREADSAFE -DBOOST_MATH_DISABLE_STD_FPCLASSIFY -DBOOST_UUID_RANDOM_PROVIDER_FORCE_POSIX -DBOOST_MPL_IGNORE_PARENTHESES_WARNING -DCMSSW_GIT_HASH="CMSSW_15_1_X_2025-06-25-1100" -DPROJECT_NAME="CMSSW" -DPROJECT_VERSION="CMSSW_15_1_X_2025-06-25-1100" -Isrc -Ipoison -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_15_1_X_2025-06-25-1100/src -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/alpaka/1.2.0-23a2bf2e896b7aace8e772f289604b47/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/pcre/8.43-2d141998cfe5424b8f7aff48035cc2da/include -isystem/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/boost/1.80.0-189b192d618e9605b04b60048d1376aa/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/bz2lib/1.0.6-d065ccd79984efc6d4660f410e4c81de/include -isystem/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/clhep/2.4.7.1-d3a3e353d370e701238f7949a0d7909f/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/gsl/2.6-f7574c606b0ce57ff601d3ca9534cd01/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/libuuid/2.34-27ce4c3579b5b1de2808ea9c4cd8ed29/include -isystem/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/lcg/root/6.32.13-e61674dd33920ceb725b332c4d0bf91b/include -isystem/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/tbb/v2022.0.0-79b5a917b0c13f831cd534a5b9f53a95/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/xz/5.2.5-6f3f49b07db84e10c9be594a1176c114/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/zlib/1.2.13-d217cdbdd8d586e845e05946de2796be/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/eigen/3bb6a48d8c171cf20b5f8e48bfb4e424fbd4f79e-5d91c922e771c0dc4f6bc00f61f3e2c5/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/eigen/3bb6a48d8c171cf20b5f8e48bfb4e424fbd4f79e-5d91c922e771c0dc4f6bc00f61f3e2c5/include/eigen3 -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/fmt/10.2.1-e35fd1db5eb3abc8ac0452e8ee427196/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/md5/1.0.0-5b594b264e04ae51e893b1d69a797ec6/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/OpenBLAS/0.3.27-70a9dd2c9f309171934f13e3003b0540/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/external/tinyxml2/6.2.0-a0ad3950415fa3138d99b7da42eb4c9f/include -DCMSSW_REFLEX_DICT src/RecoTracker/LST/src/classes.h src/RecoTracker/LST/src/classes_def.xml
In file included from input_line_7:73:
poison/RecoTracker/LST/interface/LSTOutput.h:1:2: error: THIS FILE HAS BEEN REMOVED FROM THE PACKAGE.
#error THIS FILE HAS BEEN REMOVED FROM THE PACKAGE.
 ^
Error: /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02895/el8_amd64_gcc12/lcg/root/6.32.13-e61674dd33920ceb725b332c4d0bf91b/bin/rootcling: compilation failure (tmp/el8_amd64_gcc12/src/RecoTracker/LST/src/RecoTrackerLST/lcgdict/RecoTrackerLST_xr1fb1511b33_dictUmbrella.h)
gmake: *** [tmp/el8_amd64_gcc12/src/RecoTracker/LST/src/RecoTrackerLST/lcgdict/RecoTrackerLST_xr.cc] Error 1
>> Compiling  LCG dictionary: tmp/el8_amd64_gcc12/src/RecoTracker/LST/src/RecoTrackerLST/lcgdict/RecoTrackerLST_xr.cc

cmsbuild avatar Jun 25 '25 17:06 cmsbuild

I'm curious why the file removal is not visible during local compilation. Is there some extra flag in scram to not use the removed file from the release? We should've noticed this in the LST CI.

slava77 avatar Jun 25 '25 17:06 slava77

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-48409/45312

  • There are other open Pull requests which might conflict with changes you have proposed:
    • File DataFormats/Common/src/classes_def.xml modified in PR(s): #47629
    • File RecoTracker/LSTCore/src/alpaka/LST.cc modified in PR(s): #48377
    • File RecoTracker/LSTCore/src/alpaka/LSTEvent.dev.cc modified in PR(s): #48377

cmsbuild avatar Jun 25 '25 17:06 cmsbuild

Pull request #48409 was updated. @Dr15Jones, @cmsbuild, @jfernan2, @makortel, @mandrenguyen, @smuzaffar can you please check and sign again.

cmsbuild avatar Jun 25 '25 17:06 cmsbuild

I'm curious why the file removal is not visible during local compilation. Is there some extra flag in scram to not use the removed file from the release? We should've noticed this in the LST CI.

Yeah, we definitely should have noticed this in our CI. I'll look to see if there are extra flags to make it closer to the "real" CI.

ariostas avatar Jun 25 '25 17:06 ariostas

@cmsbuild please test

slava77 avatar Jun 25 '25 18:06 slava77

I'm curious why the file removal is not visible during local compilation. Is there some extra flag in scram to not use the removed file from the release? We should've noticed this in the LST CI.

Yeah, we definitely should have noticed this in our CI. I'll look to see if there are extra flags to make it closer to the "real" CI.

@smuzaffar @iarspider please clarify if there is something special added in the bot tests to poison the removed files

slava77 avatar Jun 25 '25 18:06 slava77

@slava77 @ariostas , bot just run git cms-checkdeps -a -A after checkout the changes this poisons the deleted files. It is a good practice to run git git-cms-checkdeps -a -A to checkout all the packages which might need rebuilting due to local changes and this also will create the poison files

smuzaffar avatar Jun 25 '25 19:06 smuzaffar

Thank you, @smuzaffar! I didn't know that that also poison deleted files

ariostas avatar Jun 25 '25 20:06 ariostas

-1

Failed Tests: RelVals RelVals-ROCM Size: This PR adds an extra 64KB to repository Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-3c9102/46921/summary.html COMMIT: 31df503ac378ccb4c147f6450850e5efcca990ab CMSSW: CMSSW_15_1_X_2025-06-25-1100/el8_amd64_gcc12 Additional Tests: CUDA,ROCM User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/48409/46921/install.sh to create a dev area with all the needed externals and cmssw changes.

RelVals

----- Begin Fatal Exception 25-Jun-2025 21:12:41 CEST-----------------------
An exception of category 'PluginLibraryLoadError' occurred while
   [0] Constructing the EventProcessor
   [1] While attempting to load plugin LSTOutputConverter
Exception Message:
unable to load /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/48409/46921/CMSSW_15_1_X_2025-06-25-1100/lib/el8_amd64_gcc12/poisoned/plugin-poisoned-RecoTrackerLSTPlugins.so because /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/48409/46921/CMSSW_15_1_X_2025-06-25-1100/lib/el8_amd64_gcc12/poisoned/plugin-poisoned-RecoTrackerLSTPlugins.so: cannot open shared object file: No such file or directory
----- End Fatal Exception -------------------------------------------------
----- Begin Fatal Exception 25-Jun-2025 21:12:41 CEST-----------------------
An exception of category 'PluginLibraryLoadError' occurred while
   [0] Constructing the EventProcessor
   [1] While attempting to load plugin LSTOutputConverter
Exception Message:
unable to load /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/48409/46921/CMSSW_15_1_X_2025-06-25-1100/lib/el8_amd64_gcc12/poisoned/plugin-poisoned-RecoTrackerLSTPlugins.so because /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/48409/46921/CMSSW_15_1_X_2025-06-25-1100/lib/el8_amd64_gcc12/poisoned/plugin-poisoned-RecoTrackerLSTPlugins.so: cannot open shared object file: No such file or directory
----- End Fatal Exception -------------------------------------------------
----- Begin Fatal Exception 25-Jun-2025 21:12:41 CEST-----------------------
An exception of category 'PluginLibraryLoadError' occurred while
   [0] Constructing the EventProcessor
   [1] While attempting to load plugin LSTOutputConverter
Exception Message:
unable to load /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/48409/46921/CMSSW_15_1_X_2025-06-25-1100/lib/el8_amd64_gcc12/poisoned/plugin-poisoned-RecoTrackerLSTPlugins.so because /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/48409/46921/CMSSW_15_1_X_2025-06-25-1100/lib/el8_amd64_gcc12/poisoned/plugin-poisoned-RecoTrackerLSTPlugins.so: cannot open shared object file: No such file or directory
----- End Fatal Exception -------------------------------------------------

RelVals-ROCM

  • 12834.40612834.406_TTbar_14TeV+2024_Patatrack_PixelOnlyTripletsAlpaka/step3_TTbar_14TeV+2024_Patatrack_PixelOnlyTripletsAlpaka.log

CUDA Comparison Summary

Summary:

cmsbuild avatar Jun 25 '25 20:06 cmsbuild

The ROCm failure looks unrelated.

For the other failures, could they be false positives? The LSTOutputConverter plugin was moved from plugins to plugins/alpaka, but kept the same name. So could it be finding the poisoned one first? Otherwise, I'm not sure what would need to be updated.

ariostas avatar Jun 25 '25 20:06 ariostas

assign heterogeneous

jfernan2 avatar Jun 26 '25 07:06 jfernan2

New categories assigned: heterogeneous

@fwyzard,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks

cmsbuild avatar Jun 26 '25 07:06 cmsbuild

(just to set expectations: I am about to leave until mid-July, and I already have a backlog of PRs to review, so I can only get to it probably by the end of July)

fwyzard avatar Jun 26 '25 08:06 fwyzard

(just to set expectations: I am about to leave until mid-July, and I already have a backlog of PRs to review, so I can only get to it probably by the end of July)

Thank you for clarifying your plans. @makortel is also in the heterogeneous; is it possible to have Matti to review?

slava77 avatar Jun 26 '25 12:06 slava77

@slava77 Matti is on vacation and will not be back till the 2nd week of July.

Dr15Jones avatar Jun 26 '25 12:06 Dr15Jones

@slava77 Matti is on vacation and will not be back till the 2nd week of July.

OK, summer time.

In the meantime (while we wait for the main reviewer(s)) it would be nice to get advice/clarification on the plugin issue https://github.com/cms-sw/cmssw/pull/48409#issuecomment-3006066567

slava77 avatar Jun 26 '25 12:06 slava77

For the other failures, could they be false positives? The LSTOutputConverter plugin was moved from plugins to plugins/alpaka, but kept the same name. So could it be finding the poisoned one first? Otherwise, I'm not sure what would need to be updated.

The plugin system used the LD_LIBRARY_PATH environment variable to decide the order in which to look for plugins (this follows how the OS looks for shared libraries). The local work environment's lib directory should be in LD_LIBRARY_PATH before the local work environment's poison directory therefore the new plugin should be found first.

Are you certain the new plugin is actually being built?

@smuzaffar any thoughts?

Dr15Jones avatar Jun 26 '25 13:06 Dr15Jones

So I see that the following libraries are built

  • libRecoTrackerLSTPluginsPortableCudaAsync.so
  • libRecoTrackerLSTPluginsPortableROCmAsync.so
  • libRecoTrackerLSTPluginsPortableSerialSync.so

neither of these are an exact match for plugin-poisoned-RecoTrackerLSTPlugins.so as previously, I believe, the LSTOutputConverter was just in the plugins directory which would match the _poisoned` name.

In the conversion to an alpaka module, the module's type name would have been changed as well. It would no longer just be LSTOutputConverter. So in the configuration for these jobs, how is the type of the module specified?

Dr15Jones avatar Jun 26 '25 13:06 Dr15Jones

So you need to change https://github.com/cms-sw/cmssw/blob/3c9125e2eab1a559037a9bc56b190ad25a194c4d/HLTrigger/Configuration/python/HLT_75e33/modules/hltInitialStepTrackCandidates_cfi.py#L27

to

_hltInitialStepTrackCandidatesLST = cms.EDProducer('LSTOutputConverter@alpaka', 

and do that for all configurations.

Dr15Jones avatar Jun 26 '25 13:06 Dr15Jones