cmssw icon indicating copy to clipboard operation
cmssw copied to clipboard

`hltIntegrationTests` tests failing randomly in IBs

Open missirol opened this issue 2 years ago • 18 comments

In recent IBs, there have been seemingly-random failures of the HLT-Validation tests, e.g.

https://cmssdt.cern.ch/SDT/jenkins-artifacts/HLT-Validation/CMSSW_12_4_X_2022-04-09-1100/slc7_amd64_gcc10/HLT_Integration_PIon_MC.log https://cmssdt.cern.ch/SDT/jenkins-artifacts/HLT-Validation/CMSSW_12_3_X_2022-04-11-1100/slc7_amd64_gcc10/HLT_Integration_PIon_MC.log https://cmssdt.cern.ch/SDT/jenkins-artifacts/HLT-Validation/CMSSW_12_4_X_2022-04-11-2300/slc7_amd64_gcc10/HLT_Integration_PIon_MC.log https://cmssdt.cern.ch/SDT/jenkins-artifacts/HLT-Validation/CMSSW_12_4_X_2022-04-16-1100/slc7_amd64_gcc10/HLT_Integration_PIon_MC.log https://cmssdt.cern.ch/SDT/jenkins-artifacts/HLT-Validation/CMSSW_12_4_X_2022-04-16-1100/slc7_amd64_gcc10/HLT_Integration_PRef_MC.log

First occurrences of the issues were briefly discussed in

https://github.com/cms-sw/cmssw/pull/37304#issuecomment-1078234322 https://github.com/cms-sw/cmssw/pull/37524#issuecomment-1096751922

The cause of the issue is unclear. There is evidence that the issue is not reproducible locally, and in fact it seems to show up in IBs at random times. TSG also routinely runs these executables manually (i.e. not via IBs) during development, but I'm yet to encounter this issue locally.

The error messages point to a failure in downloading correctly the HLT config file from the database, via the hltListPaths call here and/or the hltGetConfiguration call here, as part of the executable hltIntegrationTests.

Examples:

  1. this error [1] suggests that the hlt.py dumped via hltGetConfiguration was not a valid python config;
  2. this error [2] suggests that downloading the menu inside hltListPaths failed, and then the ensuing call to hltGetConfiguration failed as well, causing an error from hltCompareResults (which read as input the invalid python config returned by hltGetConfiguration).

To my knowledge, the issue started to appear after the integration of #37283 (and its backport to 12_3_X) [3]. That PR updated hltListPaths making it maybe a bit slower; on the other hand, it did not update hltGetConfiguration in any way. Curiously, the error showed up so far only for the PIon and PRef HLT menus, which are the two smallest menus being tested (so, their download from the database is generally much quicker compared to other menus).

Given its non-reproducibility, it's unclear (to me) how to tackle this.

Could this be somehow an issue related to how these tests are run in IBs? (and/or how the database is queried in that case? are there any timeouts of any kind?)

[1]

stty: standard input: Inappropriate ioctl for device
Will run 6 HLT paths over 100 events, with 4 jobs in parallel
Extracting full menu dump
HLT menu: hltGetConfiguration /dev/CMSSW_12_3_0/PIon/V67 --full --offline --mc --input file:../RelVal_Raw_PIon_MC.root --unprescale --process TEST20220416171904 --max-events 100 --globaltag=auto:run3_mc_PIon --type=PIon
Traceback (most recent call last):
  File "/pool/condor/dir_150973/jenkins/workspace/ib-run-HLT/CMSSW_12_4_X_2022-04-16-1100/bin/slc7_amd64_gcc10/hltCheckPrescaleModules", line 25, in <module>
    exec(open(name).read(), globals(), menu.__dict__)
  File "<string>", line 10, in <module>
NameError: name 'cms' is not defined
Traceback (most recent call last):
  File "/cvmfs/cms-ib.cern.ch/nweek-02728/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_4_X_2022-04-15-1100/bin/slc7_amd64_gcc10/edmConfigDump", line 25, in <module>
    loader.exec_module(mod)
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "hlt.py", line 10, in <module>
    process.source = cms.Source( "PoolSource",
NameError: name 'cms' is not defined
Preparing single-path configurations
Running...
    full menu dump
make: *** [.makefile:17: hlt.done] Error 90
    Status_OnCPU
    Status_OnGPU
    HLTriggerFirstPath
    HLT_Physics_v7
make: *** [.makefile:23: Status_OnGPU.done] Error 90
    HLT_Random_v3
make: *** [.makefile:23: Status_OnCPU.done] Error 90
    HLT_ZeroBias_v6
make: *** [.makefile:23: HLT_Physics_v7.done] Error 90
make: Target 'Status_OnCPU' not remade because of errors.
make: Target 'Status_OnGPU' not remade because of errors.
make: Target 'HLT_Physics_v7' not remade because of errors.
make: *** [.makefile:23: HLTriggerFirstPath.done] Error 90
make: Target 'HLTriggerFirstPath' not remade because of errors.
make: *** [.makefile:23: HLT_Random_v3.done] Error 90
make: Target 'HLT_Random_v3' not remade because of errors.
make: *** [.makefile:23: HLT_ZeroBias_v6.done] Error 90
make: Target 'HLT_ZeroBias_v6' not remade because of errors.
Comparing the results of running each path by itself with those from the full menu
ERROR: Execution of the full HLT menu failed.
Please check the contents of 'hlt.log' for details.
exit status: 1
done

[2]

stty: standard input: Inappropriate ioctl for device
Traceback (most recent call last):
  File "/pool/condor/dir_150973/jenkins/workspace/ib-run-HLT/CMSSW_12_4_X_2022-04-16-1100/bin/slc7_amd64_gcc10/hltListPaths", line 190, in <module>
    paths = getPathList(config)
  File "/pool/condor/dir_150973/jenkins/workspace/ib-run-HLT/CMSSW_12_4_X_2022-04-16-1100/bin/slc7_amd64_gcc10/hltListPaths", line 32, in getPathList
    raise Exception(f'query did not return a valid HLT menu:\n query="{cmdline}"')
Exception: query did not return a valid HLT menu:
 query="hltConfigFromDB --run3 --v3 --configName /dev/CMSSW_12_3_0/PRef/V67 --noedsources --noes --noservices"
Will run 0 HLT paths over 100 events, with 4 jobs in parallel
Extracting full menu dump
HLT menu: hltGetConfiguration /dev/CMSSW_12_3_0/PRef/V67 --full --offline --mc --input file:../RelVal_Raw_PRef_MC.root --unprescale --process TEST20220416171920 --max-events 100 --globaltag=auto:run3_mc_PRef --type=PRef
Traceback (most recent call last):
  File "/pool/condor/dir_150973/jenkins/workspace/ib-run-HLT/CMSSW_12_4_X_2022-04-16-1100/bin/slc7_amd64_gcc10/hltCheckPrescaleModules", line 25, in <module>
    exec(open(name).read(), globals(), menu.__dict__)
  File "<string>", line 10, in <module>
NameError: name 'cms' is not defined
Traceback (most recent call last):
  File "/cvmfs/cms-ib.cern.ch/nweek-02728/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_4_X_2022-04-15-1100/bin/slc7_amd64_gcc10/edmConfigDump", line 25, in <module>
    loader.exec_module(mod)
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "hlt.py", line 10, in <module>
    process.source = cms.Source( "PoolSource",
NameError: name 'cms' is not defined
Preparing single-path configurations
Running...
    full menu dump
make: *** [.makefile:17: hlt.done] Error 90
    full menu dump
make: *** [.makefile:17: hlt.done] Error 90
make: Target 'all' not remade because of errors.
Comparing the results of running each path by itself with those from the full menu
ERROR: Execution of the full HLT menu failed.
Please check the contents of 'hlt.log' for details.
exit status: 1
done

[3] Reverting #37283 in full is not a good option, because that PR introduced functionalities needed to test the latest HLT menus.

missirol avatar Apr 17 '22 10:04 missirol

A new Issue was created by @missirol Marino Missiroli.

@Dr15Jones, @perrotta, @dpiparo, @makortel, @smuzaffar, @qliphy can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

cmsbuild avatar Apr 17 '22 10:04 cmsbuild

assign core, hlt

makortel avatar Apr 18 '22 18:04 makortel

New categories assigned: core,hlt

@missirol,@Dr15Jones,@smuzaffar,@makortel,@Martin-Grunewald you have been requested to review this Pull request/Issue and eventually sign? Thanks

cmsbuild avatar Apr 18 '22 18:04 cmsbuild

I remember seeing this kind of errors

NameError: name 'cms' is not defined

recently in other tests too (was unable to find those now though). I wonder if this could be e.g. a CVMFS issue on a worker node?

makortel avatar Apr 18 '22 18:04 makortel

Just noting here another occurrence of the issue in CMSSW_12_3_X_2022-04-25-2300.

https://cmssdt.cern.ch/SDT/jenkins-artifacts/HLT-Validation/CMSSW_12_3_X_2022-04-25-2300/slc7_amd64_gcc10/runIB.log

02:28:48 hltIntegrationTests /dev/CMSSW_12_3_0/HIon/V72 -d HLT_Integration_HIon_MC -i file:../RelVal_Raw_HIon_MC.root -n 100 -j 4 --mc -x --globaltag=auto:run3_mc_HIon -x --type=HIon >& HLT_Integration_HIon_MC.log
2.097u 1.186s 0:05.99 54.5%    0+0k 2373016+976io 49958pf+0w
02:28:54 exit status: 1

02:28:54 hltIntegrationTests /dev/CMSSW_12_3_0/PIon/V72 -d HLT_Integration_PIon_MC -i file:../RelVal_Raw_PIon_MC.root -n 100 -j 4 --mc -x --globaltag=auto:run3_mc_PIon -x --type=PIon >& HLT_Integration_PIon_MC.log
2.131u 1.264s 0:10.79 31.4%    0+0k 4129304+1000io 21167pf+0w
02:29:05 exit status: 1

https://cmssdt.cern.ch/SDT/jenkins-artifacts/HLT-Validation/CMSSW_12_3_X_2022-04-25-2300/slc7_amd64_gcc10/HLT_Integration_HIon_MC.log

stty: standard input: Inappropriate ioctl for device
Traceback (most recent call last):
  File "/pool/condor/dir_222511/jenkins/workspace/ib-run-HLT/CMSSW_12_3_X_2022-04-25-2300/bin/slc7_amd64_gcc10/hltListPaths", line 190, in <module>
    paths = getPathList(config)
  File "/pool/condor/dir_222511/jenkins/workspace/ib-run-HLT/CMSSW_12_3_X_2022-04-25-2300/bin/slc7_amd64_gcc10/hltListPaths", line 32, in getPathList
    raise Exception(f'query did not return a valid HLT menu:\n query="{cmdline}"')
Exception: query did not return a valid HLT menu:
 query="hltConfigFromDB --run3 --v3 --configName /dev/CMSSW_12_3_0/HIon/V72 --noedsources --noes --noservices"
Will run 0 HLT paths over 100 events, with 4 jobs in parallel
Extracting full menu dump
HLT menu: hltGetConfiguration /dev/CMSSW_12_3_0/HIon/V72 --full --offline --mc --input file:../RelVal_Raw_HIon_MC.root --unprescale --process TEST20220426022851 --max-events 100 --globaltag=auto:run3_mc_HIon --type=HIon
Traceback (most recent call last):
  File "/pool/condor/dir_222511/jenkins/workspace/ib-run-HLT/CMSSW_12_3_X_2022-04-25-2300/bin/slc7_amd64_gcc10/hltCheckPrescaleModules", line 25, in <module>
    exec(open(name).read(), globals(), menu.__dict__)
  File "<string>", line 10, in <module>
NameError: name 'cms' is not defined
Traceback (most recent call last):
  File "/cvmfs/cms-ib.cern.ch/nweek-02730/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_X_2022-04-24-0000/bin/slc7_amd64_gcc10/edmConfigDump", line 25, in <module>
    loader.exec_module(mod)
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "hlt.py", line 10, in <module>
    process.source = cms.Source( "PoolSource",
NameError: name 'cms' is not defined
Preparing single-path configurations
Running...
    full menu dump
make: *** [.makefile:17: hlt.done] Error 90
    full menu dump
make: *** [.makefile:17: hlt.done] Error 90
make: Target 'all' not remade because of errors.
Comparing the results of running each path by itself with those from the full menu
ERROR: Execution of the full HLT menu failed.
Please check the contents of 'hlt.log' for details.
exit status: 1
done

https://cmssdt.cern.ch/SDT/jenkins-artifacts/HLT-Validation/CMSSW_12_3_X_2022-04-25-2300/slc7_amd64_gcc10/HLT_Integration_PIon_MC.log

stty: standard input: Inappropriate ioctl for device
Traceback (most recent call last):
  File "/pool/condor/dir_222511/jenkins/workspace/ib-run-HLT/CMSSW_12_3_X_2022-04-25-2300/bin/slc7_amd64_gcc10/hltListPaths", line 190, in <module>
    paths = getPathList(config)
  File "/pool/condor/dir_222511/jenkins/workspace/ib-run-HLT/CMSSW_12_3_X_2022-04-25-2300/bin/slc7_amd64_gcc10/hltListPaths", line 32, in getPathList
    raise Exception(f'query did not return a valid HLT menu:\n query="{cmdline}"')
Exception: query did not return a valid HLT menu:
 query="hltConfigFromDB --run3 --v3 --configName /dev/CMSSW_12_3_0/PIon/V72 --noedsources --noes --noservices"
Will run 0 HLT paths over 100 events, with 4 jobs in parallel
Extracting full menu dump
HLT menu: hltGetConfiguration /dev/CMSSW_12_3_0/PIon/V72 --full --offline --mc --input file:../RelVal_Raw_PIon_MC.root --unprescale --process TEST20220426022856 --max-events 100 --globaltag=auto:run3_mc_PIon --type=PIon
Traceback (most recent call last):
  File "/pool/condor/dir_222511/jenkins/workspace/ib-run-HLT/CMSSW_12_3_X_2022-04-25-2300/bin/slc7_amd64_gcc10/hltCheckPrescaleModules", line 25, in <module>
    exec(open(name).read(), globals(), menu.__dict__)
  File "<string>", line 10, in <module>
NameError: name 'cms' is not defined
Traceback (most recent call last):
  File "/cvmfs/cms-ib.cern.ch/nweek-02730/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_X_2022-04-24-0000/bin/slc7_amd64_gcc10/edmConfigDump", line 25, in <module>
    loader.exec_module(mod)
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "hlt.py", line 10, in <module>
    process.source = cms.Source( "PoolSource",
NameError: name 'cms' is not defined
Preparing single-path configurations
Running...
    full menu dump
make: *** [.makefile:17: hlt.done] Error 90
    full menu dump
make: *** [.makefile:17: hlt.done] Error 90
make: Target 'all' not remade because of errors.
Comparing the results of running each path by itself with those from the full menu
ERROR: Execution of the full HLT menu failed.
Please check the contents of 'hlt.log' for details.
exit status: 1
done

missirol avatar Apr 26 '22 21:04 missirol

Another occurrence of the issue in CMSSW_12_3_X_2022-04-27-1100.

Errors are similar to https://github.com/cms-sw/cmssw/issues/37598#issuecomment-1110265799. Example:

https://cmssdt.cern.ch/SDT/jenkins-artifacts/HLT-Validation/CMSSW_12_3_X_2022-04-27-1100/slc7_amd64_gcc10/runIB.log

18:27:12 hltIntegrationTests /dev/CMSSW_12_3_0/PIon/V72 -d HLT_Integration_PIon_MC -i file:../RelVal_Raw_PIon_MC.root -n 100 -j 4 --mc -x --globaltag=auto:run3_mc_PIon -x --type=PIon >& HLT_Integration_PIon_MC.log
11.007u 2.286s 0:38.04 34.9%	0+0k 1986120+920io 52369pf+0w
18:27:50 exit status: 1

18:27:50 hltIntegrationTests /dev/CMSSW_12_3_0/PRef/V72 -d HLT_Integration_PRef_MC -i file:../RelVal_Raw_PRef_MC.root -n 100 -j 4 --mc -x --globaltag=auto:run3_mc_PRef -x --type=PRef >& HLT_Integration_PRef_MC.log
1.996u 0.622s 0:03.61 72.2%	0+0k 524288+944io 1778pf+0w
18:27:54 exit status: 1

missirol avatar Apr 30 '22 11:04 missirol

Another occurrence of the issue in CMSSW_12_3_X_2022-05-02-2300. Errors are similar to https://github.com/cms-sw/cmssw/issues/37598#issuecomment-1110265799.

In the last 10 days, the issue has continued to appear in 12_3_X IBs, but not in 12_4_X IBs (maybe it is just a coincidence). The HLT menus in those releases are the same. Is there anything different in how IBs run for 12_3_X and 12_4_X? (generic question, but I'm trying to figure out if something could explain the apparent lack of issues in recent 12_4_X IBs)

missirol avatar May 03 '22 12:05 missirol

Another occurrence of the issue in CMSSW_12_4_X_2022-05-13-2300. Errors are similar to https://github.com/cms-sw/cmssw/issues/37598#issuecomment-1110265799, but this time only for the HIon menu.

Is there anything different in how IBs run for 12_3_X and 12_4_X?

This latest failure was in 12_4_X (master), suggesting that there might be no differences between 12_3_X IBs and 12_4_X IBs for what concerns this particular problem.

https://cmssdt.cern.ch/SDT/jenkins-artifacts/HLT-Validation/CMSSW_12_4_X_2022-05-13-2300/slc7_amd64_gcc10/runIB.log

[..]
03:01:32 hltIntegrationTests /dev/CMSSW_12_3_0/GRun/V79 -d HLT_Integration_GRun_MC -i file:../RelVal_Raw_GRun_MC.root -n 100 -j 4 --mc -x --globaltag=auto:run3_mc_GRun -x --type=GRun >& HLT_Integration_GRun_MC.log
25416.330u 6401.524s 3:37:51.43 243.4%	0+0k 1954842544+1449568io 8562759pf+0w
06:39:23 exit status: 0

06:39:23 hltIntegrationTests /dev/CMSSW_12_3_0/HIon/V79 -d HLT_Integration_HIon_MC -i file:../RelVal_Raw_HIon_MC.root -n 100 -j 4 --mc -x --globaltag=auto:run3_mc_HIon -x --type=HIon >& HLT_Integration_HIon_MC.log
109.351u 51.588s 2:45.39 97.3%	0+0k 30597312+12816io 315924pf+0w
06:42:09 exit status: 1

06:42:09 hltIntegrationTests /dev/CMSSW_12_3_0/PIon/V79 -d HLT_Integration_PIon_MC -i file:../RelVal_Raw_PIon_MC.root -n 100 -j 4 --mc -x --globaltag=auto:run3_mc_PIon -x --type=PIon >& HLT_Integration_PIon_MC.log
105.738u 15.689s 1:35.85 126.6%	0+0k 13181608+73944io 74766pf+0w
06:43:45 exit status: 0

06:43:45 hltIntegrationTests /dev/CMSSW_12_3_0/PRef/V79 -d HLT_Integration_PRef_MC -i file:../RelVal_Raw_PRef_MC.root -n 100 -j 4 --mc -x --globaltag=auto:run3_mc_PRef -x --type=PRef >& HLT_Integration_PRef_MC.log
2582.033u 907.994s 51:46.90 112.3%	0+0k 97426000+429728io 462013pf+0w
07:35:32 exit status: 0
[..]

https://cmssdt.cern.ch/SDT/jenkins-artifacts/HLT-Validation/CMSSW_12_4_X_2022-05-13-2300/slc7_amd64_gcc10/HLT_Integration_HIon_MC.log

stty: standard input: Inappropriate ioctl for device
Will run 429 HLT paths over 100 events, with 4 jobs in parallel
Extracting full menu dump
HLT menu: hltGetConfiguration /dev/CMSSW_12_3_0/HIon/V79 --full --offline --mc --input file:../RelVal_Raw_HIon_MC.root --unprescale --process TEST20220514064003 --max-events 100 --globaltag=auto:run3_mc_HIon --type=HIon
Traceback (most recent call last):
  File "/pool/condor/dir_18534/jenkins/workspace/ib-run-HLT/CMSSW_12_4_X_2022-05-13-2300/bin/slc7_amd64_gcc10/hltCheckPrescaleModules", line 25, in <module>
    exec(open(name).read(), globals(), menu.__dict__)
  File "<string>", line 10, in <module>
NameError: name 'cms' is not defined
Traceback (most recent call last):
  File "/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_4_X_2022-05-13-2300/bin/slc7_amd64_gcc10/edmConfigDump", line 26, in <module>
    loader.exec_module(mod)
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "hlt.py", line 10, in <module>
    process.source = cms.Source( "PoolSource",
NameError: name 'cms' is not defined
[..]

missirol avatar May 15 '22 12:05 missirol

Another occurrence of the issue in CMSSW_12_4_X_2022-05-17-2300.

This intermittent issue keeps appearing, so it might be useful to start thinking about a way to solve it via software (e.g. retrying the query).

missirol avatar May 18 '22 15:05 missirol

Other occurrences of this issue in CMSSW_12_5_X_2022-05-23-2300 CMSSW_12_5_X_2022-05-27-1100

missirol avatar May 29 '22 10:05 missirol

The problem hasn't shown up in the IBs of the last ten days, or so.

I don't know why; I just wonder if anything related to the DB (and/or the queries to it) has changed.

missirol avatar Jun 13 '22 16:06 missirol

As far as I can see, this problem has not re-appeared, so something must have improved. :)

missirol avatar Jul 12 '22 15:07 missirol

The issue re-appeared in CMSSW_12_4_X_2022-08-12-1100.

https://cmssdt.cern.ch/SDT/jenkins-artifacts/HLT-Validation/CMSSW_12_4_X_2022-08-12-1100/el8_amd64_gcc10/HLT_Integration_PIon_DATA.log

stty: 'standard input': Inappropriate ioctl for device
Will run 6 HLT paths over 100 events, with 4 jobs in parallel
Extracting full menu dump
HLT menu: hltGetConfiguration /dev/CMSSW_12_4_0/PIon/V94 --full --offline --data --input file:../RelVal_Raw_PIon_DATA.root --unprescale --process TEST20220812172737 --max-events 100 --globaltag=auto:run3_hlt_PIon --type=PIon
Traceback (most recent call last):
  File "/pool/condor/dir_39524/jenkins/workspace/ib-run-HLT/CMSSW_12_4_X_2022-08-12-1100/bin/el8_amd64_gcc10/hltCheckPrescaleModules", line 25, in <module>
    exec(open(name).read(), globals(), menu.__dict__)
  File "<string>", line 10, in <module>
NameError: name 'cms' is not defined
Traceback (most recent call last):
  File "/cvmfs/cms-ib.cern.ch/nweek-02745/el8_amd64_gcc10/cms/cmssw/CMSSW_12_4_X_2022-08-11-2300/bin/el8_amd64_gcc10/edmConfigDump", line 26, in <module>
    loader.exec_module(mod)
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "hlt.py", line 10, in <module>
    process.source = cms.Source( "PoolSource",
NameError: name 'cms' is not defined
Preparing single-path configurations
Running...
	full menu dump
make: *** [.makefile:17: hlt.done] Error 90
	HLTriggerFirstPath
	Status_OnGPU
	HLT_Physics_v8
	Status_OnCPU
make: *** [.makefile:23: HLT_Physics_v8.done] Error 90
make: *** [.makefile:23: HLTriggerFirstPath.done] Error 90
	HLT_Random_v3
	HLT_ZeroBias_v7
make: *** [.makefile:23: Status_OnCPU.done] Error 90
make: Target 'HLTriggerFirstPath' not remade because of errors.
make: Target 'Status_OnCPU' not remade because of errors.
make: Target 'HLT_Physics_v8' not remade because of errors.
make: *** [.makefile:23: Status_OnGPU.done] Error 90
make: Target 'Status_OnGPU' not remade because of errors.
make: *** [.makefile:23: HLT_Random_v3.done] Error 90
make: *** [.makefile:23: HLT_ZeroBias_v7.done] Error 90
make: Target 'HLT_Random_v3' not remade because of errors.
make: Target 'HLT_ZeroBias_v7' not remade because of errors.
Comparing the results of running each path by itself with those from the full menu
ERROR: Execution of the full HLT menu failed.
Please check the contents of 'hlt.log' for details.
exit status: 1
done

https://cmssdt.cern.ch/SDT/jenkins-artifacts/HLT-Validation/CMSSW_12_4_X_2022-08-12-1100/el8_amd64_gcc10/HLT_Integration_PRef_DATA.log

stty: 'standard input': Inappropriate ioctl for device
Traceback (most recent call last):
  File "/pool/condor/dir_39524/jenkins/workspace/ib-run-HLT/CMSSW_12_4_X_2022-08-12-1100/bin/el8_amd64_gcc10/hltListPaths", line 190, in <module>
    paths = getPathList(config)
  File "/pool/condor/dir_39524/jenkins/workspace/ib-run-HLT/CMSSW_12_4_X_2022-08-12-1100/bin/el8_amd64_gcc10/hltListPaths", line 32, in getPathList
    raise Exception(f'query did not return a valid HLT menu:\n query="{cmdline}"')
Exception: query did not return a valid HLT menu:
 query="hltConfigFromDB --run3 --v3 --configName /dev/CMSSW_12_4_0/PRef/V94 --noedsources --noes --noservices"
Will run 0 HLT paths over 100 events, with 4 jobs in parallel
Extracting full menu dump
HLT menu: hltGetConfiguration /dev/CMSSW_12_4_0/PRef/V94 --full --offline --data --input file:../RelVal_Raw_PRef_DATA.root --unprescale --process TEST20220812172745 --max-events 100 --globaltag=auto:run3_hlt_PRef --type=PRef
Traceback (most recent call last):
  File "/pool/condor/dir_39524/jenkins/workspace/ib-run-HLT/CMSSW_12_4_X_2022-08-12-1100/bin/el8_amd64_gcc10/hltCheckPrescaleModules", line 25, in <module>
    exec(open(name).read(), globals(), menu.__dict__)
  File "<string>", line 10, in <module>
NameError: name 'cms' is not defined
Traceback (most recent call last):
  File "/cvmfs/cms-ib.cern.ch/nweek-02745/el8_amd64_gcc10/cms/cmssw/CMSSW_12_4_X_2022-08-11-2300/bin/el8_amd64_gcc10/edmConfigDump", line 26, in <module>
    loader.exec_module(mod)
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "hlt.py", line 10, in <module>
    process.source = cms.Source( "PoolSource",
NameError: name 'cms' is not defined
Preparing single-path configurations
Running...
	full menu dump
make: *** [.makefile:17: hlt.done] Error 90
	full menu dump
make: *** [.makefile:17: hlt.done] Error 90
make: Target 'all' not remade because of errors.
Comparing the results of running each path by itself with those from the full menu
ERROR: Execution of the full HLT menu failed.
Please check the contents of 'hlt.log' for details.
exit status: 1
done

missirol avatar Aug 17 '22 06:08 missirol

Another instance of the issue was in CMSSW_12_5_X_2022-08-17-1100. The errors are virtually identical to https://github.com/cms-sw/cmssw/issues/37598#issuecomment-1217514871.

missirol avatar Aug 19 '22 13:08 missirol

Another instance of the issue was in CMSSW_12_5_X_2022-08-24-2300.

https://cmssdt.cern.ch/SDT/jenkins-artifacts/HLT-Validation/CMSSW_12_5_X_2022-08-24-2300/el8_amd64_gcc10/HLT_Integration_HIon_DATA.log

stty: 'standard input': Inappropriate ioctl for device
Traceback (most recent call last):
  File "/pool/condor/dir_60795/jenkins/workspace/ib-run-HLT/CMSSW_12_5_X_2022-08-24-2300/bin/el8_amd64_gcc10/hltListPaths", line 190, in <module>
    paths = getPathList(config)
  File "/pool/condor/dir_60795/jenkins/workspace/ib-run-HLT/CMSSW_12_5_X_2022-08-24-2300/bin/el8_amd64_gcc10/hltListPaths", line 32, in getPathList
    raise Exception(f'query did not return a valid HLT menu:\n query="{cmdline}"')
Exception: query did not return a valid HLT menu:
 query="hltConfigFromDB --run3 --v3 --configName /dev/CMSSW_12_4_0/HIon/V110 --noedsources --noes --noservices"
Will run 0 HLT paths over 100 events, with 4 jobs in parallel
Extracting full menu dump
HLT menu: hltGetConfiguration /dev/CMSSW_12_4_0/HIon/V110 --full --offline --data --input file:../RelVal_Raw_HIon_DATA.root --unprescale --process TEST20220825045934 --max-events 100 --globaltag=auto:run3_hlt_HIon --type=HIon
Traceback (most recent call last):
  File "/pool/condor/dir_60795/jenkins/workspace/ib-run-HLT/CMSSW_12_5_X_2022-08-24-2300/bin/el8_amd64_gcc10/hltCheckPrescaleModules", line 25, in <module>
    exec(open(name).read(), globals(), menu.__dict__)
  File "<string>", line 5, in <module>
NameError: name 'cms' is not defined
Traceback (most recent call last):
  File "/cvmfs/cms-ib.cern.ch/week1/el8_amd64_gcc10/cms/cmssw/CMSSW_12_5_X_2022-08-24-2300/bin/el8_amd64_gcc10/edmConfigDump", line 26, in <module>
    loader.exec_module(mod)
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "hlt.py", line 5, in <module>
    process.source = cms.Source( "PoolSource",
NameError: name 'cms' is not defined
Preparing single-path configurations
Running...
	full menu dump
make: *** [.makefile:17: hlt.done] Error 90
	full menu dump
make: *** [.makefile:17: hlt.done] Error 90
make: Target 'all' not remade because of errors.
Comparing the results of running each path by itself with those from the full menu
ERROR: Execution of the full HLT menu failed.
Please check the contents of 'hlt.log' for details.
exit status: 1
done

missirol avatar Aug 25 '22 13:08 missirol

Another instance of the issue was in CMSSW_12_5_X_2022-08-30-2300.

https://cmssdt.cern.ch/SDT/jenkins-artifacts/HLT-Validation/CMSSW_12_5_X_2022-08-30-2300/el8_amd64_gcc10/HLT_Integration_PRef_MC.log

missirol avatar Sep 04 '22 12:09 missirol

Another instance of this issue was in CMSSW_12_4_X_2022-10-07-1100.

https://cmssdt.cern.ch/SDT/jenkins-artifacts/HLT-Validation/CMSSW_12_4_X_2022-10-07-1100/el8_amd64_gcc10/HLT_Integration_PIon_MC.log

missirol avatar Oct 09 '22 10:10 missirol

Another instance of this issue was in CMSSW_12_4_X_2022-10-11-1100, albeit with a somewhat new error message [*].

(I know I sound like a broken record; I just mean to highlight that the issue persists; when there are less urgent matters, I will try to come up with a solution, e.g. https://github.com/cms-sw/cmssw/issues/39345#issuecomment-1244964477; ETA: EOY).

[*] https://cmssdt.cern.ch/SDT/jenkins-artifacts/HLT-Validation/CMSSW_12_4_X_2022-10-11-1100/el8_amd64_gcc10/HLT_Integration_GRun_MC.log

stty: 'standard input': Inappropriate ioctl for device
Will run 674 HLT paths over 100 events, with 4 jobs in parallel
Extracting full menu dump
Traceback (most recent call last):
  File "/cvmfs/cms-ib.cern.ch/nweek-02754/el8_amd64_gcc10/external/py3-urllib3/1.26.6-504ee060441080cce4ff715292ff47ca/lib/python3.9/site-packages/urllib3/connectionpool.py", line 699, in urlopen
    httplib_response = self._make_request(
  File "/cvmfs/cms-ib.cern.ch/nweek-02754/el8_amd64_gcc10/external/py3-urllib3/1.26.6-504ee060441080cce4ff715292ff47ca/lib/python3.9/site-packages/urllib3/connectionpool.py", line 445, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/cvmfs/cms-ib.cern.ch/nweek-02754/el8_amd64_gcc10/external/py3-urllib3/1.26.6-504ee060441080cce4ff715292ff47ca/lib/python3.9/site-packages/urllib3/connectionpool.py", line 440, in _make_request
    httplib_response = conn.getresponse()
  File "/cvmfs/cms-ib.cern.ch/week0/el8_amd64_gcc10/external/python3/3.9.6-67e5cf5b4952101922f1d4c8474baa39/lib/python3.9/http/client.py", line 1349, in getresponse
    response.begin()
  File "/cvmfs/cms-ib.cern.ch/week0/el8_amd64_gcc10/external/python3/3.9.6-67e5cf5b4952101922f1d4c8474baa39/lib/python3.9/http/client.py", line 316, in begin
    version, status, reason = self._read_status()
  File "/cvmfs/cms-ib.cern.ch/week0/el8_amd64_gcc10/external/python3/3.9.6-67e5cf5b4952101922f1d4c8474baa39/lib/python3.9/http/client.py", line 285, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/cvmfs/cms-ib.cern.ch/nweek-02754/el8_amd64_gcc10/external/py3-requests/2.26.0-0d6433445dfa3a94b84d1ce98b51f46e/lib/python3.9/site-packages/requests/adapters.py", line 439, in send
    resp = conn.urlopen(
  File "/cvmfs/cms-ib.cern.ch/nweek-02754/el8_amd64_gcc10/external/py3-urllib3/1.26.6-504ee060441080cce4ff715292ff47ca/lib/python3.9/site-packages/urllib3/connectionpool.py", line 755, in urlopen
    retries = retries.increment(
  File "/cvmfs/cms-ib.cern.ch/nweek-02754/el8_amd64_gcc10/external/py3-urllib3/1.26.6-504ee060441080cce4ff715292ff47ca/lib/python3.9/site-packages/urllib3/util/retry.py", line 532, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/cvmfs/cms-ib.cern.ch/nweek-02754/el8_amd64_gcc10/external/py3-urllib3/1.26.6-504ee060441080cce4ff715292ff47ca/lib/python3.9/site-packages/urllib3/packages/six.py", line 769, in reraise
    raise value.with_traceback(tb)
  File "/cvmfs/cms-ib.cern.ch/nweek-02754/el8_amd64_gcc10/external/py3-urllib3/1.26.6-504ee060441080cce4ff715292ff47ca/lib/python3.9/site-packages/urllib3/connectionpool.py", line 699, in urlopen
    httplib_response = self._make_request(
  File "/cvmfs/cms-ib.cern.ch/nweek-02754/el8_amd64_gcc10/external/py3-urllib3/1.26.6-504ee060441080cce4ff715292ff47ca/lib/python3.9/site-packages/urllib3/connectionpool.py", line 445, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/cvmfs/cms-ib.cern.ch/nweek-02754/el8_amd64_gcc10/external/py3-urllib3/1.26.6-504ee060441080cce4ff715292ff47ca/lib/python3.9/site-packages/urllib3/connectionpool.py", line 440, in _make_request
    httplib_response = conn.getresponse()
  File "/cvmfs/cms-ib.cern.ch/week0/el8_amd64_gcc10/external/python3/3.9.6-67e5cf5b4952101922f1d4c8474baa39/lib/python3.9/http/client.py", line 1349, in getresponse
    response.begin()
  File "/cvmfs/cms-ib.cern.ch/week0/el8_amd64_gcc10/external/python3/3.9.6-67e5cf5b4952101922f1d4c8474baa39/lib/python3.9/http/client.py", line 316, in begin
    version, status, reason = self._read_status()
  File "/cvmfs/cms-ib.cern.ch/week0/el8_amd64_gcc10/external/python3/3.9.6-67e5cf5b4952101922f1d4c8474baa39/lib/python3.9/http/client.py", line 285, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/pool/condor/dir_35162/jenkins/workspace/ib-run-HLT/CMSSW_12_4_X_2022-10-11-1100/bin/el8_amd64_gcc10/hltGetConfiguration", line 251, in <module>
    print(confdb.HLTProcess(config).dump())
  File "/pool/condor/dir_35162/jenkins/workspace/ib-run-HLT/CMSSW_12_4_X_2022-10-11-1100/python/HLTrigger/Configuration/Tools/confdb.py", line 53, in __init__
    self.converter = OfflineConverter(version = self.config.menu.version, database = self.config.menu.database, proxy = self.config.proxy, proxyHost = self.config.proxy_host, proxyPort = self.config.proxy_port)
  File "/pool/condor/dir_35162/jenkins/workspace/ib-run-HLT/CMSSW_12_4_X_2022-10-11-1100/python/HLTrigger/Configuration/Tools/confdbOfflineConverter.py", line 131, in __init__
    version_website = requests.get(self.baseUrl+"/../confdb.version").text
  File "/cvmfs/cms-ib.cern.ch/nweek-02754/el8_amd64_gcc10/external/py3-requests/2.26.0-0d6433445dfa3a94b84d1ce98b51f46e/lib/python3.9/site-packages/requests/api.py", line 75, in get
    return request('get', url, params=params, **kwargs)
  File "/cvmfs/cms-ib.cern.ch/nweek-02754/el8_amd64_gcc10/external/py3-requests/2.26.0-0d6433445dfa3a94b84d1ce98b51f46e/lib/python3.9/site-packages/requests/api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "/cvmfs/cms-ib.cern.ch/nweek-02754/el8_amd64_gcc10/external/py3-requests/2.26.0-0d6433445dfa3a94b84d1ce98b51f46e/lib/python3.9/site-packages/requests/sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)
  File "/cvmfs/cms-ib.cern.ch/nweek-02754/el8_amd64_gcc10/external/py3-requests/2.26.0-0d6433445dfa3a94b84d1ce98b51f46e/lib/python3.9/site-packages/requests/sessions.py", line 655, in send
    r = adapter.send(request, **kwargs)
  File "/cvmfs/cms-ib.cern.ch/nweek-02754/el8_amd64_gcc10/external/py3-requests/2.26.0-0d6433445dfa3a94b84d1ce98b51f46e/lib/python3.9/site-packages/requests/adapters.py", line 498, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
HLT menu: hltGetConfiguration /dev/CMSSW_12_4_0/GRun/V145 --full --offline --mc --input file:../RelVal_Raw_GRun_MC.root --unprescale --process TEST20221011111238 --max-events 100 --globaltag=auto:run3_mc_GRun --type=GRun
Traceback (most recent call last):
  File "/pool/condor/dir_35162/jenkins/workspace/ib-run-HLT/CMSSW_12_4_X_2022-10-11-1100/bin/el8_amd64_gcc10/hltCheckPrescaleModules", line 25, in <module>
    exec(open(name).read(), globals(), menu.__dict__)
  File "<string>", line 4, in <module>
NameError: name 'cms' is not defined
Traceback (most recent call last):
  File "/cvmfs/cms-ib.cern.ch/nweek-02754/el8_amd64_gcc10/cms/cmssw/CMSSW_12_4_X_2022-10-09-0000/bin/el8_amd64_gcc10/edmConfigDump", line 26, in <module>
    loader.exec_module(mod)
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "hlt.py", line 4, in <module>
    process.hltTriggerSummaryAOD = cms.EDProducer( "TriggerSummaryProducerAOD",
NameError: name 'cms' is not defined
Preparing single-path configurations
Running...
	full menu dump
make: *** [.makefile:17: hlt.done] Error 90
	HLT_AK8PFJet360_TrimMass30_v20
	Status_OnGPU
	Status_OnCPU
	HLTriggerFirstPath
make: *** [.makefile:23: Status_OnCPU.done] Error 90
	HLT_AK8PFJet380_TrimMass30_v13

[..]

Comparing the results of running each path by itself with those from the full menu
ERROR: Execution of the full HLT menu failed.
Please check the contents of 'hlt.log' for details.
exit status: 1
done

missirol avatar Oct 12 '22 07:10 missirol

Another instance of this issue was in CMSSW_12_4_X_2022-11-01-1100.

https://cmssdt.cern.ch/SDT/jenkins-artifacts/HLT-Validation/CMSSW_12_4_X_2022-11-01-1100/el8_amd64_gcc10/HLT_Integration_PIon_MC.log

missirol avatar Nov 02 '22 08:11 missirol

+hlt

I will try to come up with a solution, e.g. https://github.com/cms-sw/cmssw/issues/39345#issuecomment-1244964477; ETA: EOY

#40004 and its backports have removed queries to ConfDB in IB tests. This should, by construction, remove occurrences of this issue for 12_4_X and higher, so I'm signing this.

Having said that, the root cause of these failures (see also #39345) still escapes me. The symptom is a failure in downloading configurations from ConfDB (only some of them usually, during the same IB), which leads to invalid cfg files. The nodes running tests in IB don't have /afs access, so the ConfDB .jar files are downloaded locally, but it's unclear (to me) whether or not this is part of the issue. The code seems to account for the fact that multiple downloads can happen simultaneously, but I didn't try to stress-test this.

missirol avatar Nov 24 '22 16:11 missirol

+core

Although in the end there wasn't much (anything?) for core.

makortel avatar Nov 28 '22 14:11 makortel

@cmsbuild, please close

makortel avatar Nov 28 '22 14:11 makortel

This issue is fully signed and ready to be closed.

cmsbuild avatar Nov 28 '22 14:11 cmsbuild