[UBSAN] Array index out of bounds in RecoPixelVertexing/PixelTriplets/plugins/CAHitNtupletGeneratorKernels.cc
Compilation of RecoPixelVertexing/PixelTriplets failed in UBSAN:
>> Building edm plugin tmp/el8_amd64_gcc11/src/RecoPixelVertexing/PixelTriplets/plugins/RecoPixelVertexingPixelTripletsPlugins/libRecoPixelVertexingPixelTripletsPlugins.so
(...)
.../src/RecoPixelVertexing/PixelTriplets/plugins/CAHitNtupletGeneratorKernels.cc: In member function 'launchKernels':
.../src/HeterogeneousCore/CUDAUtilities/interface/FlexiStorage.h:14:50: error: array subscript 262143 is above array bounds of 'unsigned int[32769]' [-Werror=array-bounds]
14 | constexpr I& operator[](int i) { return m_v[i]; }
| ^
.../src/HeterogeneousCore/CUDAUtilities/interface/FlexiStorage.h:21:9: note: while referencing 'm_v'
21 | I m_v[S];
| ^
.../src/HeterogeneousCore/CUDAUtilities/interface/FlexiStorage.h:14:50: error: array subscript 262144 is above array bounds of 'unsigned int[32769]' [-Werror=array-bounds]
14 | constexpr I& operator[](int i) { return m_v[i]; }
| ^
.../src/HeterogeneousCore/CUDAUtilities/interface/FlexiStorage.h:21:9: note: while referencing 'm_v'
21 | I m_v[S];
| ^
lto1: some warnings being treated as errors
Full log: link
A new Issue was created by @iarspider .
@Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar can you please review it and eventually sign/assign? Thanks.
cms-bot commands are listed here
assign reconstruction, heterogeneous
New categories assigned: heterogeneous,reconstruction
@mandrenguyen,@fwyzard,@clacaputo,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks
The function in question is this one https://github.com/cms-sw/cmssw/blob/3d761d84ee43f5ab61cf104d5081e09b074159b1/RecoPixelVertexing/PixelTriplets/plugins/CAHitNtupletGeneratorKernels.cc#L79-L141 The compiler error message is not very helpful to point further, partially because of the heavy inlining.
@VinInn Would the numbers 32769 and 26214[34] ring any bell towards narrowing down the code causing the warning?
Tagging also @AdrianoDee because he touched some of the involved data structures recently
32768 is the max number of tuples for Phase1 (here).
And 262144 for Phase2 (here).
Thanks @AdrianoDee. So if we believe the compiler's error message, it would look like somewhere a Phase1 data structure would be accessed in a loop over pixelTopology::Phase2::maxNumberOfTuples elements. Any ideas where such pattern might occur?
Note that the file compiles fine by itself, the error is from LTO in the link phase. That suggests that, if real, the error somehow involves crossing file boundaries in LTO.
I tried stubbing out the routines called by launchKernels(), and found that there are compilation errors in four routines in RecoPixelVertexing/PixelTriplets/plugins/CAHitNtupletGeneratorKernelsImpl.h, specifically kernel_connect(), kernel_countMultiplicity(), kernel_fillMultiplicity(), and kernel_fillHitDetIndices(). For (at least) three of these, the compilation error is associated with calls into cms::cuda::OneToManyAssoc like tracks_view.hitIndices().nOnes().
type tracking
If I comment out either of these two lines: https://github.com/cms-sw/cmssw/blob/df27e39abd260bb8bebfc1cf5fb66be5a88fbb0e/RecoPixelVertexing/PixelTriplets/plugins/CAHitNtupletGeneratorKernels.cc#L228-L229 then compilation succeeds through the LTO intermediate steps. The link ultimately fails due to missing symbols, but it gets past the LTO re-compiles. I think that makes it very likely a compiler bug.
We moved to non-LTO ASAN and UBSAN IBs. So the build is is not there any more.
cms-bot internal usage
closing this issue as we do not get this build error for UBSAN. Please reopen if needed