Expensive allocations of a `std::vector<unsigned int>` in `CellularAutomaton::createAndConnectCells`
This report seems to hint that CellularAutomaton::createAndConnectCells is responsible for heavy allocations triggered by the resize of a std::vector<unsigned int>. I cannot really pinpoint where this happens, but it would be nice to get rid of this 0.2%.
Regarding the same method, perhaps using a std::bitset might be preferrable to the std::vector<bool>.
A new Issue was created by @dpiparo Danilo Piparo.
@Dr15Jones, @perrotta, @dpiparo, @makortel, @smuzaffar, @qliphy can you please review it and eventually sign/assign? Thanks.
cms-bot commands are listed here
assign reconstruction
FYI @cms-sw/tracking-pog-l2
New categories assigned: reconstruction
@jpata,@slava77,@clacaputo you have been requested to review this Pull request/Issue and eventually sign? Thanks
type performance-improvements
type tracking
(I started looking into this - just to prevent potential overlap)
This is old CPU code. It should be replaced by the new GPU implementation
@AdrianoDee maybe do you know how to tackle this?
Hi @dpiparo, @jpata, indeed I may have some insight but I'm not sure how much this could help. I think the place where these allocations happens is
https://github.com/cms-sw/cmssw/blob/6d2f66057131baacc2fcbdd203588c41c885b42c/RecoPixelVertexing/PixelTriplets/src/CellularAutomaton.cc#L59-L61
for the CAHitQuadrupletGenerator
and here
https://github.com/cms-sw/cmssw/blob/6d2f66057131baacc2fcbdd203588c41c885b42c/RecoPixelVertexing/PixelTriplets/src/CellularAutomaton.cc#L195-L197
for the CAHitTripletGenerator, where the vector keeping the index of the cells an hit is outer cell of gets expanded. For Phase II these sizes are not negligible (while for Run3 they are). See e.g.

So here https://github.com/cms-sw/cmssw/compare/CMSSW_12_4_X...AdrianoDee:cmssw:fix_to_ca_124 I had added a parameter (CAcellsPerOuterHit ) to reserve the maximum size of the vector when the CALayer is created (changing it for PhaseII/Run3 with the era modifiers). My preliminary results (run on a local machine) seems to show an improvement in the allocations but do not bring an improvement in general for createAndConnectCells. See:
Next I need to do three things: u
- se a proper machine for timing (e.g.
vocms011) - moving the vector to a fixed size array
- or checking the sizes for the different steps and change the parameter accordingly (for some of them, note the two peak, the reserve would most probably be oversizes)
Even though on the medium/long term I tend to agree with Vincenzo about having it replaced with the GPU version.
This issue came up in https://github.com/cms-sw/cmssw/issues/46040#issuecomment-2420404065 . These allocations are about 2.6 % of all allocations within the event processing in a Run 3 prompt reco job.
(@cms-sw/reconstruction-l2 @cms-sw/tracking-pog-l2 to refresh recipients)
cms-bot internal usage