Improve performance by pre-allocating vectors
In some cases we create vectors for response eg OpResult<StringVec> dfly::OpInter(const dfly::Transaction* t, dfly::EngineShard* es, bool remove_first) and push elements into it iterating over a set.
These vectors can end up being multiple 100s of MiB in size (250MiB vectors have been observed).
This ends up in a lot of reallocations over time as the vector grows and potentially moves to new memory pages.
It can be more efficient (to be tested if it actually is) to first determine the result size, allocate vector with that capacity, and then add results to it, or even use a chunked vector similar to deque but with configurable chunk size, which chains together contiguous blocks of memory and allocates a new one when the existing one is full.
We can use a vector pool -- we do something similar on the connection level and specifically for pipeline messages. The more the pool is used the more we allow it to grow (and similarly we shrink it if it remains unused)
@kostasrim connection vector is KB, here @abhijat tell us about 250MB. so I prefer to see deque
- it's a minor issue
-
It can be more efficient (to be tested if it actually is)is a key question. A reproducible example with proof where we preallocate vectors indeed moves the needle is required before we talk about the implementation.