AliceO2 icon indicating copy to clipboard operation
AliceO2 copied to clipboard

DPL: Get stuck in some pathological cases of incorrect EoS, instead of quitting without error

Open davidrohr opened this issue 1 year ago • 8 comments

@ktf : This is the patch we discussed this morning. For me the FST runs through with it. I think in principle we don't need it, on the other hand it would be the slightly better behavior in case of this error condition.

I am slightly concerned whether we might break something with it. I.e., in case we remove e.g. a timer channel with info.state != InputChannelState::Pull and then set it to false, coud we break something?

davidrohr avatar Aug 25 '22 11:08 davidrohr

Indeed we should check that distance(begin, newEnd) + fake channels != distance(begin, pollOrder.end()), no?

ktf avatar Aug 25 '22 14:08 ktf

Error while checking build/O2/fullCI for 76c1d20088a6f532a72b89aff8d15b2261d6a1d3 at 2022-09-29 08:50:

## sw/BUILD/o2checkcode-latest/log
/sw/SOURCES/O2/9694-slc8_x86-64/0/GPU/Common/test/testGPUsortCUDA.cu:22:10: error: 'boost/test/unit_test.hpp' file not found [clang-diagnostic-error]
/sw/SOURCES/O2/9694-slc8_x86-64/0/GPU/GPUTracking/TRDTracking/GPUTRDTracker.cxx:37:10: error: 'omp.h' file not found [clang-diagnostic-error]
/sw/SOURCES/O2/9694-slc8_x86-64/0/GPU/GPUTracking/Base/GPUReconstruction.cxx:37:10: error: 'omp.h' file not found [clang-diagnostic-error]
/sw/SOURCES/O2/9694-slc8_x86-64/0/GPU/GPUTracking/Base/GPUReconstructionCPU.cxx:45:10: error: 'omp.h' file not found [clang-diagnostic-error]
/sw/SOURCES/O2/9694-slc8_x86-64/0/GPU/GPUTracking/display/GPUDisplay.cxx:36:10: error: 'omp.h' file not found [clang-diagnostic-error]
/sw/SOURCES/O2/9694-slc8_x86-64/0/GPU/GPUTracking/Base/cuda/GPUReconstructionCUDAGenRTC.cu:16:10: error: 'omp.h' file not found [clang-diagnostic-error]
/sw/SOURCES/O2/9694-slc8_x86-64/0/GPU/GPUbenchmark/cuda/../Shared/Utils.h:26:10: error: 'boost/program_options.hpp' file not found [clang-diagnostic-error]
/sw/SOURCES/O2/9694-slc8_x86-64/0/GPU/GPUbenchmark/cuda/../Shared/Utils.h:26:10: error: 'boost/program_options.hpp' file not found [clang-diagnostic-error]
/sw/SOURCES/O2/9694-slc8_x86-64/0/Framework/Logger/include/Framework/Logger.h:14:10: error: 'fairlogger/Logger.h' file not found [clang-diagnostic-error]
/usr/local/cuda-11.7/include/cuda/std/detail/libcxx/include/type_traits:520:12: error: CUDA device code does not support variadic functions [clang-diagnostic-error]
/usr/local/cuda-11.7/include/cuda/std/detail/libcxx/include/type_traits:1059:69: error: CUDA device code does not support variadic functions [clang-diagnostic-error]
/usr/local/cuda-11.7/include/cuda/std/detail/libcxx/include/type_traits:1175:5: error: CUDA device code does not support variadic functions [clang-diagnostic-error]
/usr/local/cuda-11.7/include/cuda/std/detail/libcxx/include/type_traits:1852:16: error: CUDA device code does not support variadic functions [clang-diagnostic-error]
/usr/local/cuda-11.7/include/cuda/std/detail/libcxx/include/type_traits:2276:18: error: CUDA device code does not support variadic functions [clang-diagnostic-error]
/usr/local/cuda-11.7/include/cuda/std/detail/libcxx/include/type_traits:4336:16: error: CUDA device code does not support variadic functions [clang-diagnostic-error]
/usr/local/cuda-11.7/include/sm_20_atomic_functions.h:89:39: error: redefinition of 'atomicAdd' [clang-diagnostic-error]
/sw/SOURCES/O2/9694-slc8_x86-64/0/Framework/Logger/include/Framework/Logger.h:14:10: error: 'fairlogger/Logger.h' file not found [clang-diagnostic-error]
/usr/local/cuda-11.7/include/cuda/std/detail/libcxx/include/type_traits:520:12: error: CUDA device code does not support variadic functions [clang-diagnostic-error]
/usr/local/cuda-11.7/include/cuda/std/detail/libcxx/include/type_traits:1059:69: error: CUDA device code does not support variadic functions [clang-diagnostic-error]
/usr/local/cuda-11.7/include/cuda/std/detail/libcxx/include/type_traits:1175:5: error: CUDA device code does not support variadic functions [clang-diagnostic-error]
/usr/local/cuda-11.7/include/cuda/std/detail/libcxx/include/type_traits:1852:16: error: CUDA device code does not support variadic functions [clang-diagnostic-error]
/usr/local/cuda-11.7/include/cuda/std/detail/libcxx/include/type_traits:2276:18: error: CUDA device code does not support variadic functions [clang-diagnostic-error]
/usr/local/cuda-11.7/include/cuda/std/detail/libcxx/include/type_traits:4336:16: error: CUDA device code does not support variadic functions [clang-diagnostic-error]
/usr/local/cuda-11.7/include/sm_20_atomic_functions.h:89:39: error: redefinition of 'atomicAdd' [clang-diagnostic-error]
/sw/SOURCES/O2/9694-slc8_x86-64/0/GPU/GPUbenchmark/hip/../Shared/Utils.h:146:34: error: use of undeclared identifier 'int4'; did you mean 'int'? [clang-diagnostic-error]
/sw/SOURCES/O2/9694-slc8_x86-64/0/GPU/GPUbenchmark/hip/Kernels.hip.cxx:64:8: error: unknown type name '__host__' [clang-diagnostic-error]
/sw/SOURCES/O2/9694-slc8_x86-64/0/GPU/GPUbenchmark/hip/Kernels.hip.cxx:64:27: error: expected ';' after top level declarator [clang-diagnostic-error]
/sw/SOURCES/O2/9694-slc8_x86-64/0/GPU/GPUbenchmark/hip/Kernels.hip.cxx:64:33: error: overloaded 'operator+=' must have at least one parameter of class or enumeration type [clang-diagnostic-error]
/sw/SOURCES/O2/9694-slc8_x86-64/0/GPU/GPUbenchmark/hip/Kernels.hip.cxx:64:44: error: unknown type name 'int4'; did you mean 'int'? [clang-diagnostic-error]
/sw/SOURCES/O2/9694-slc8_x86-64/0/GPU/GPUbenchmark/hip/Kernels.hip.cxx:64:53: error: unknown type name 'int4'; did you mean 'int'? [clang-diagnostic-error]
/sw/SOURCES/O2/9694-slc8_x86-64/0/GPU/GPUbenchmark/hip/Kernels.hip.cxx:66:4: error: member reference base type 'int' is not a structure or union [clang-diagnostic-error]
/sw/SOURCES/O2/9694-slc8_x86-64/0/GPU/GPUbenchmark/hip/Kernels.hip.cxx:66:11: error: member reference base type 'int' is not a structure or union [clang-diagnostic-error]
/sw/SOURCES/O2/9694-slc8_x86-64/0/GPU/GPUbenchmark/hip/Kernels.hip.cxx:67:4: error: member reference base type 'int' is not a structure or union [clang-diagnostic-error]
/sw/SOURCES/O2/9694-slc8_x86-64/0/GPU/GPUbenchmark/hip/Kernels.hip.cxx:67:11: error: member reference base type 'int' is not a structure or union [clang-diagnostic-error]
/sw/SOURCES/O2/9694-slc8_x86-64/0/GPU/GPUbenchmark/hip/Kernels.hip.cxx:68:4: error: member reference base type 'int' is not a structure or union [clang-diagnostic-error]
/sw/SOURCES/O2/9694-slc8_x86-64/0/GPU/GPUbenchmark/hip/Kernels.hip.cxx:68:11: error: member reference base type 'int' is not a structure or union [clang-diagnostic-error]
/sw/SOURCES/O2/9694-slc8_x86-64/0/GPU/GPUbenchmark/hip/Kernels.hip.cxx:69:4: error: member reference base type 'int' is not a structure or union [clang-diagnostic-error]
/sw/SOURCES/O2/9694-slc8_x86-64/0/GPU/GPUbenchmark/hip/Kernels.hip.cxx:69:11: error: member reference base type 'int' is not a structure or union [clang-diagnostic-error]
/sw/SOURCES/O2/9694-slc8_x86-64/0/GPU/GPUbenchmark/hip/Kernels.hip.cxx:85:1: error: unknown type name '__global__' [clang-diagnostic-error]
/sw/SOURCES/O2/9694-slc8_x86-64/0/GPU/GPUbenchmark/hip/Kernels.hip.cxx:90:19: error: use of undeclared identifier 'blockIdx' [clang-diagnostic-error]
/sw/SOURCES/O2/9694-slc8_x86-64/0/GPU/GPUbenchmark/hip/Kernels.hip.cxx:90:32: error: use of undeclared identifier 'blockDim' [clang-diagnostic-error]
/sw/SOURCES/O2/9694-slc8_x86-64/0/GPU/GPUbenchmark/hip/Kernels.hip.cxx:90:45: error: use of undeclared identifier 'threadIdx' [clang-diagnostic-error]
/sw/SOURCES/O2/9694-slc8_x86-64/0/GPU/GPUbenchmark/hip/Kernels.hip.cxx:90:78: error: use of undeclared identifier 'blockDim' [clang-diagnostic-error]
/sw/SOURCES/O2/9694-slc8_x86-64/0/GPU/GPUbenchmark/hip/../Shared/Utils.h:146:34: error: use of undeclared identifier 'int4'; did you mean 'int'? [clang-diagnostic-error]
/sw/SOURCES/O2/9694-slc8_x86-64/0/GPU/GPUbenchmark/hip/benchmark.hip.cxx:199:35: error: use of undeclared identifier 'int4'; did you mean 'int'? [clang-diagnostic-error]
/sw/SOURCES/O2/9694-slc8_x86-64/0/Detectors/EMCAL/calibration/include/EMCALCalibration/EMCALCalibExtractor.h:36:10: error: 'omp.h' file not found [clang-diagnostic-error]
/sw/SOURCES/O2/9694-slc8_x86-64/0/Detectors/EMCAL/calibration/include/EMCALCalibration/EMCALCalibExtractor.h:36:10: error: 'omp.h' file not found [clang-diagnostic-error]
/sw/SOURCES/O2/9694-slc8_x86-64/0/Detectors/EMCAL/calibration/include/EMCALCalibration/EMCALCalibExtractor.h:36:10: error: 'omp.h' file not found [clang-diagnostic-error]
/sw/SOURCES/O2/9694-slc8_x86-64/0/Detectors/TOF/calibration/src/TOFChannelCalibrator.cxx:23:10: error: 'omp.h' file not found [clang-diagnostic-error]

Full log here.

alibuild avatar Aug 30 '22 17:08 alibuild

Error while checking build/O2/o2-dataflow-cs8 for 76c1d20088a6f532a72b89aff8d15b2261d6a1d3 at 2022-10-01 07:34:

## sw/BUILD/O2-latest/log
100% tests passed, 0 tests failed out of 434
 94/106 Test #108: test_Framework_test_ExternalFairMQDeviceWorkflow ........***Timeout  30.05 sec
99% tests passed, 1 tests failed out of 102

Full log here.

alibuild avatar Aug 31 '22 01:08 alibuild

Error while checking build/O2/o2-cs8 for 76c1d20088a6f532a72b89aff8d15b2261d6a1d3 at 2022-09-26 06:46:

## sw/BUILD/O2-latest/log
100% tests passed, 0 tests failed out of 457
 95/106 Test #109: test_Framework_test_ExternalFairMQDeviceWorkflow ........***Timeout  30.05 sec
99% tests passed, 1 tests failed out of 102

Full log here.

alibuild avatar Aug 31 '22 04:08 alibuild

Error while checking build/AliceO2/O2/o2/macOS-arm for 76c1d20088a6f532a72b89aff8d15b2261d6a1d3 at 2022-09-29 18:28:

## sw/BUILD/O2-latest/log
100% tests passed, 0 tests failed out of 456
 95/103 Test #109: test_Framework_test_ExternalFairMQDeviceWorkflow ........***Timeout  30.05 sec
99% tests passed, 1 tests failed out of 102

Full log here.

alibuild avatar Aug 31 '22 07:08 alibuild

Error while checking build/O2/o2-dataflow for 76c1d20088a6f532a72b89aff8d15b2261d6a1d3 at 2022-09-28 23:28:

## sw/BUILD/O2-latest/log
100% tests passed, 0 tests failed out of 434
 95/106 Test #109: test_Framework_test_ExternalFairMQDeviceWorkflow ........***Timeout  30.04 sec
99% tests passed, 1 tests failed out of 102

Full log here.

alibuild avatar Aug 31 '22 10:08 alibuild

Error while checking build/O2/o2 for 76c1d20088a6f532a72b89aff8d15b2261d6a1d3 at 2022-09-30 17:54:

## sw/BUILD/O2-latest/log
100% tests passed, 0 tests failed out of 457
 95/106 Test #109: test_Framework_test_ExternalFairMQDeviceWorkflow ........***Timeout  30.04 sec
99% tests passed, 1 tests failed out of 102

Full log here.

alibuild avatar Aug 31 '22 12:08 alibuild

Error while checking build/AliceO2/O2/o2/macOS for 76c1d20088a6f532a72b89aff8d15b2261d6a1d3 at 2022-09-29 21:17:

## sw/BUILD/O2-latest/log
100% tests passed, 0 tests failed out of 456
 95/103 Test #109: test_Framework_test_ExternalFairMQDeviceWorkflow ........***Timeout  30.08 sec
99% tests passed, 1 tests failed out of 102

Full log here.

alibuild avatar Sep 03 '22 23:09 alibuild

@ktf : getting back to this after my vacation. I don't really see how I can implement the

Indeed we should check that distance(begin, newEnd) + fake channels != distance(begin, pollOrder.end()), no?

perhaps I'd just close the PR and we leave it as it is, hoping it won't happen again. It is anyway only error handling for a case that should never appear.

davidrohr avatar Sep 27 '22 09:09 davidrohr