PyVRP
PyVRP copied to clipboard
BrokenPairsDistance for different problem variants.
For VRPTW, routes are directed so the current brokenPairsDistance implementation compares directed routes (through directed neighbours).
- For CVRP, we should probably implement a a version that considers undirected routes (undirected neighbours) as in HGS-CVRP.
- For heterogeneous vehicle capacities (#203), we should probably take into account vehicle assignments in the distance computation, but then the name
brokenPairsDistanceis no longer accurate.
We could implement different versions (directed, undirected, heterogeneous) but that would put a responsibility for the user to pick the right version for the right problem/variant. I suggest we implement something to make picking the right option the default.
Note that always considering vehicle assignments (in something named more generally like solution_distance) will not change behavior for the homogeneous case (may just be a bit slower), so we only need to distinguish directed vs undirected which could be done based on a property of ProblemData (like ProblemData.is_undirected which holds true for CVRP/TSP with symmetric distances), then using that to select the right distance function when initializing Population.
Out of the box: do we need this function to be on the c++ side? If we bind the Individual::neighbours (/assignments) as numpy arrays we can compute the distance efficiently through numpy (but we should avoid repeated copying of the neighbours, so maybe wrap the individual object in Python). Not sure though.
@N-Wouda, do you have any specific thoughts on how to best handle this, e.g. use the appropriate distance function for the right problem variants?
Our current BPD doesn't take all aspects of an instance/solution into account, but it seems to work fairly well for the variants we support so far and it's really easy conceptually. Do you know if there is actually an issue here that we must address?
I had some discussions with a student who investigated the parent selection for (undirected) CVRP and found a lot of duplicate solutions are in the population which aren't recognized and removed as duplicates due to the direction of the routes... he suggested this was a problem and wanted to implement undirected BPD for this... so I wanted to hear your thoughts.
Is that student currently working on undirected BPD? I think Thibaut's version can be adapted pretty easily to fit our function signature, so it need not be too hard to do. We could implement an undirected variant of BPD if the performance difference between the two is really significant on their benchmark, alongside the current directed implementation.
If the difference matters we should probably include some of their benchmark instances into our benchmark set going forward, because they then have properties our current instances do not yet test for.
I think we first have to figure out if having symmetric BPD matters or not. I suggest implementing the symmetric BPD, which should not be too hard, and then we can run a CVRP benchmark. If this matters, then I'd say that we keep the symmetric BPD as an alternative diversity measure. This already partially solves the problem.
Then we also have #311. I think that with the symmetric BPD, we modify SubPopulation.add to only add non-duplicate solutions, where two solutions A and B are considered to be duplicate when BPD(A, B) = 0.
I think we first have to figure out if having symmetric BPD matters or not. I suggest implementing the symmetric BPD, which should not be too hard, and then we can run a CVRP benchmark. If this matters, then I'd say that we keep the symmetric BPD as an alternative diversity measure. This already partially solves the problem.
Sounds good :)
Then we also have #311. I think that with the symmetric BPD, we modify
SubPopulation.addto only add non-duplicate solutions, where two solutions A and B are considered to be duplicate when BPD(A, B) = 0.
Ok, sounds good too, but here are 2 things: 1) using BPD instead of == in SubPopulation to determine duplicates (currently in purge) and 2) checking duplicate solutions upon insertion which I think is a good idea but is independent from this, maybe consider it in #283?
Ok, sounds good too, but here are 2 things: 1) using BPD instead of == in SubPopulation to determine duplicates (currently in purge) and 2) checking duplicate solutions upon insertion which I think is a good idea but is independent from this, maybe consider it in https://github.com/PyVRP/PyVRP/issues/283?
You're right. I was thinking of a situation where we don't add duplicate solutions. For the current setup (if I'm not mistaken) we can change equality checking in purge to checking if the proximity/diversity value is zero.
For the current setup (if I'm not mistaken) we can change equality checking in purge to checking if the proximity/diversity value is zero.
BPD does not guarantee that BPD(X, Y) == 0 means X and Y are duplicates. They could have different vehicle assignments, for example. If we want "diversity 0 means solutions are duplicates" we would need to work on supporting that somehow.
Something like set semantics for Population seems easier to me (see also #87 for that). We can reject inserting if either duplicates exist (using __eq__), or if the solution has diversity 0 to some solution already in the population. That we could explain fairly well because checking some diversity things when adding new solutions is completely reasonable.
I ran a benchmark last night for CVRP with symmetric BPD (see https://github.com/PyVRP/PyVRP/compare/main...bpd-symmetric). Not sure if it's 100% correct but it captures the same idea.
The average gap was 0.17%, slightly lower than the current main (0.22%). Not a huge difference, but it's nice to know it's better.
I ran a benchmark last night for CVRP with symmetric BPD. The average gap was 0.17%, slightly lower than the current main (0.22%). Not a huge difference, but it's nice to know it's better.
This might be worth repeating for 0.7.0. If the improvement is large enough (say, >0.05%?) we could consider implementing undirected BPD. It's easy enough to do so, and could be worthwhile for users solving problem variants where this matters.
@leonlan can I assign you to this issue?
Yes I assigned myself to this!
Generalizing BPD
A more generalized diversity measure computes diversity as (# broken pairs + # broken assignments) / (# pairs + # assignments). A "broken assignment" is whenever client 1 has vehicle type $i$ in solution 1 and vehicle type $j$ in solution 2 with $i \neq j$. This results in zero diversity when two solutions are identical to each other. It also behaves like BrokenPairsDistance whenever the fleet is homogeneous (as Wouter stated in the initial message).
So we can add broken assignment counts to BrokenPairsDistance, but we probably might have to rename it.
Undirected BPD
The changes to make BPD undirected are very small:
# in broken_pairs_distance.cpp
- numBrokenPairs += fSucc != sSucc;
- numBrokenPairs += fPred != sPred;
+ numBrokenPairs += fSucc != sSucc && fSucc != sPred;
+ numBrokenPairs += fPred != sPred && fPred != sSucc;
I went digging in my "experiment" (https://github.com/PyVRP/PyVRP/issues/101#issuecomment-1662515714) and also changed purge to delete solutions with zero diversity.
# in SubPopulation::purge
- return !iterator.proximity.empty()
- && *iterator.proximity[0].second == *iterator.solution;
+ return !iterator.proximity.empty() // duplicate when
+ && iterator.proximity[0].first == 0; // zero diversity
I don't know if only adding BPD undirected will result in any performance improvements, since "duplicates" are still not removed in that case.
If better performance is our objective, this needs to become part of a larger change, including #87 and #311.
I'll proceed as follows:
- I'll implement generalized BPD first with broken assignments. This can be a stand-alone PR and should not influence any performance. Open question: how to rename this?
- Then I'll implement undirected BPD. I'll check for performance improvements when using undirected BPD on CVRP.
- If there are improvements, this will be a stand-alone PR.
- If there are no improvements, then I'll implement set semantics for Population.
- I'll compare Population set semantics + general BPD vs. Population set semantics + undirected BPD. If there is performance difference, then there's reason to keep undirected BPD. If there is no difference, I propose to not keep undirected BPD.
Will start working on this tomorrow.
I hope to pick this up next week. I got busy with the backhaul benchmarking stuff last week, and this week I have other stuff to do.
@N-Wouda, Do we still care about the undirected BPD variant? I personally don't care about a 0.05% performance improvement by specializing on directed/undirected variants.
I think this'd be nice to have for undirected problems, but since we do not yet have an algorithmic way to detect that we can postpone such things for now. I want to work on automatically detecting problem attributes (and using that information to e.g. skip computations if possible) once we have streamlined the operators a bit so that we'd not have to implement this four or more times.
@leonlan is something like this also used in ILS? I'm not too familiar with the specifics.
Yea, there is a concept of "distances between solutions" in Maximo's AILS. It's the same as the broken pairs distance (not sure if directed or not). Regardless, I think it will be useful to add an is_directed flag to the current BPD that relies on #525.
Closing this because I don't think we need this.