cmssw icon indicating copy to clipboard operation
cmssw copied to clipboard

[UBSAN] Array index out of bounds in RecoPixelVertexing/PixelTriplets/plugins/CAHitNtupletGeneratorKernels.cc

Open iarspider opened this issue 2 years ago • 13 comments

Compilation of RecoPixelVertexing/PixelTriplets failed in UBSAN:

>> Building edm plugin tmp/el8_amd64_gcc11/src/RecoPixelVertexing/PixelTriplets/plugins/RecoPixelVertexingPixelTripletsPlugins/libRecoPixelVertexingPixelTripletsPlugins.so
(...)
.../src/RecoPixelVertexing/PixelTriplets/plugins/CAHitNtupletGeneratorKernels.cc: In member function 'launchKernels':
 .../src/HeterogeneousCore/CUDAUtilities/interface/FlexiStorage.h:14:50: error: array subscript 262143 is above array bounds of 'unsigned int[32769]' [-Werror=array-bounds]
    14 |       constexpr I& operator[](int i) { return m_v[i]; }
      |                                                  ^
.../src/HeterogeneousCore/CUDAUtilities/interface/FlexiStorage.h:21:9: note: while referencing 'm_v'
   21 |       I m_v[S];
      |         ^
  .../src/HeterogeneousCore/CUDAUtilities/interface/FlexiStorage.h:14:50: error: array subscript 262144 is above array bounds of 'unsigned int[32769]' [-Werror=array-bounds]
    14 |       constexpr I& operator[](int i) { return m_v[i]; }
      |                                                  ^
.../src/HeterogeneousCore/CUDAUtilities/interface/FlexiStorage.h:21:9: note: while referencing 'm_v'
   21 |       I m_v[S];
      |         ^
lto1: some warnings being treated as errors

Full log: link

iarspider avatar Mar 02 '23 08:03 iarspider

A new Issue was created by @iarspider .

@Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

cmsbuild avatar Mar 02 '23 08:03 cmsbuild

assign reconstruction, heterogeneous

iarspider avatar Mar 02 '23 08:03 iarspider

New categories assigned: heterogeneous,reconstruction

@mandrenguyen,@fwyzard,@clacaputo,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks

cmsbuild avatar Mar 02 '23 08:03 cmsbuild

The function in question is this one https://github.com/cms-sw/cmssw/blob/3d761d84ee43f5ab61cf104d5081e09b074159b1/RecoPixelVertexing/PixelTriplets/plugins/CAHitNtupletGeneratorKernels.cc#L79-L141 The compiler error message is not very helpful to point further, partially because of the heavy inlining.

@VinInn Would the numbers 32769 and 26214[34] ring any bell towards narrowing down the code causing the warning?

Tagging also @AdrianoDee because he touched some of the involved data structures recently

makortel avatar Mar 02 '23 14:03 makortel

32768 is the max number of tuples for Phase1 (here).

AdrianoDee avatar Mar 02 '23 14:03 AdrianoDee

And 262144 for Phase2 (here).

AdrianoDee avatar Mar 02 '23 14:03 AdrianoDee

Thanks @AdrianoDee. So if we believe the compiler's error message, it would look like somewhere a Phase1 data structure would be accessed in a loop over pixelTopology::Phase2::maxNumberOfTuples elements. Any ideas where such pattern might occur?

makortel avatar Mar 02 '23 22:03 makortel

Note that the file compiles fine by itself, the error is from LTO in the link phase. That suggests that, if real, the error somehow involves crossing file boundaries in LTO.

dan131riley avatar Mar 03 '23 19:03 dan131riley

I tried stubbing out the routines called by launchKernels(), and found that there are compilation errors in four routines in RecoPixelVertexing/PixelTriplets/plugins/CAHitNtupletGeneratorKernelsImpl.h, specifically kernel_connect(), kernel_countMultiplicity(), kernel_fillMultiplicity(), and kernel_fillHitDetIndices(). For (at least) three of these, the compilation error is associated with calls into cms::cuda::OneToManyAssoc like tracks_view.hitIndices().nOnes().

dan131riley avatar Mar 13 '23 15:03 dan131riley

type tracking

slava77 avatar Mar 13 '23 16:03 slava77

If I comment out either of these two lines: https://github.com/cms-sw/cmssw/blob/df27e39abd260bb8bebfc1cf5fb66be5a88fbb0e/RecoPixelVertexing/PixelTriplets/plugins/CAHitNtupletGeneratorKernels.cc#L228-L229 then compilation succeeds through the LTO intermediate steps. The link ultimately fails due to missing symbols, but it gets past the LTO re-compiles. I think that makes it very likely a compiler bug.

dan131riley avatar Mar 13 '23 20:03 dan131riley

We moved to non-LTO ASAN and UBSAN IBs. So the build is is not there any more.

smuzaffar avatar Oct 16 '24 10:10 smuzaffar

cms-bot internal usage

cmsbuild avatar Oct 16 '24 10:10 cmsbuild

closing this issue as we do not get this build error for UBSAN. Please reopen if needed

smuzaffar avatar Nov 01 '24 17:11 smuzaffar