Improve memory allocation for SWAP* insertion ranges
In the SWAP* operator, we have to check validity for the addition of a range that consist of a portion of a route with the addition of a single job that is either appended at the front or the back of that range.
When sketching the implementation in #580, I couldn't come up with anything more efficient than just copying the initial range to account for both situations with the added job before and after:
https://github.com/VROOM-Project/vroom/blob/ae413af69663c3ce1c67b134f48b647920f97837/src/algorithms/local_search/swap_star_utils.h#L101-L115
This uses std::copy for the initial range and a push_back either before or after copying and is of course very expensive, especially since we have quite a lot of swap choices options to go through. I suspect this is responsible for most of the computing time devoted to this local search operator.
Now the initial need is "simply" to pass iterators for the candidate range to the various is_valid_for_* validity check functions down the line for tests so it feels wrong to have to copy everything for that purpose. On the other hand, I have not found a more efficient way to describe that range and avoid the copies.
Happy to receive any idea or suggestion on that one, maybe the new range library can help here?
The same kind of same-range-except-begin-or-end situation happens in IntraRelocate:
https://github.com/VROOM-Project/vroom/blob/e3938092e60e1b2d2c5da58feef0b5fe819cd5f4/src/problems/cvrp/operators/intra_relocate.cpp#L39-L49
and in various flavors across the Intra* operators. Based on some naïve tests, this accounts for the most part of the computing time related to those operators...
The more I think about this the more I'm convinced this would be a big step forward to easily further reduce computing times. Taking a look at how ranges work, it seems we could achieve the desired effect using views::concat from the range-v3 implementation. This library has been prefiguring the standard for ranges in C++20 but for some reason it looks like concat didn't make it to the standard.
On the other hand, we don't really need to concat any arbitrary views, simply being able to iterate over an existing vector + a single value before or after. Probably we can come up with some custom structure owning a reference to the vector and the value and offering the required iteration interface.
@jcoupey do you think this is still relevant and worth pursuing?
Copying some values into a std::vector should be very fast, especially for primitive types whare it can boil down to a memcpy operation. It might be easier for the compiler to optimise if we used std::vector::insert insted of std::copy. The only somewhat expensive operation is memory allocation. This could be mitigated in several ways. We could reuse the vector, so once some memory is allocated for one copy, it will be reused for subsequent ones (unless more memory is needed). The vector might need to be a static or thread-local variable. We could also preallocate some memory to minimise the need for subsequent reallocations.
On the other hand I did prepare a small class that behaves like a view allowing iteration over a vector with an additional element in front or at back, but am yet to integrate it with the rest of the code.
do you think this is still relevant and worth pursuing?
@kkarbowiak I don't have a definitive answer here. Probably the impact is less higher than when I filed this ticket due to various changes (e.g. local search operator pruning). Yet this still feels like it could clearly be improved. If we can get even a small efficiency gain here this could still make a significant difference overall as those situations are in very hot part of the codebase.
Thanks for the ideas, if you have anything that can be tested I'd be happy to do some benchmarking/profiling.
do you think this is still relevant and worth pursuing?
@kkarbowiak I don't have a definitive answer here. Probably the impact is less higher than when I filed this ticket due to various changes (e.g. local search operator pruning). Yet this still feels like it could clearly be improved. If we can get even a small efficiency gain here this could still make a significant difference overall as those situations are in very hot part of the codebase.
Thanks for the ideas, if you have anything that can be tested I'd be happy to do some benchmarking/profiling.
Understood. Please have a look at #1106. This is a draft of a solution that should work. If it looks promising I will turn it into a proper PR.