Instances for testing
Issue #105 has demonstrated that we need better testing of the algorithms.
One approach is to hand-craft tests. But that's pretty slow and limiting.
It'd be better if we could have a set of instances in a file format with a corresponding set of efficient solutions that we could check against.
Here's a library of instances: https://github.com/vOptSolver/vOptLib
It doesn't use a standardized file format, and it doesn't have solutions. But we could translate everything into JuMP and then into .mof.json (https://jump.dev/MathOptFormat), and then compare the various algorithms against each other.
The output would be: a standardized set of instances and a set of solutions :smile: which seems like a useful thing for the community.
I'm very open to discussions for ideas on this...
We have many 2-objective and several 3-objective instances on our archives. All of them have been used for numerical experiments in our papers and projects since the last 30 years.
Several instances are indeed already online on https://github.com/vOptSolver/vOptLib (or have been on MCDMlib, MOCOLib and GuepardLib, all maintained also by us in the past).
And for some instances,
- $Y_N$ , the set of non-dominated points
- $X_{E}$, a corresponding set of efficient solutions (or $X_{Em}$, a set of minimal efficient solutions)
are available, e.g.:
- knapsack problems: https://github.com/vOptSolver/vOptLib/blob/master/UKP/uncorrelated.md
- partitioning problems: https://github.com/vOptSolver/vOptLib/tree/master/SPA
- packing problems: https://github.com/vOptSolver/vOptLib/blob/master/SPP/readme.md
- Truck-to-dock Door Assignment Problem: https://github.com/xgandibleux/TDAP/tree/main/res
For
-
2-objective problems: codes for computing $Y_N$ and $X_{E}$ / $X_{Em}$ are available into
vOptSpecific.jlfor knapsack (jl & C), assignment(jl & C), and facility location problems (jl & C++). ps: this package is not evolving since a long time and it was developed when we were "beginners" in the interface jl & C/C++ (techniques we used for interfacing languages and embed C/C++ binary into julia are probably outdated now); we have in mind the question of moving one day these specific algorithms to MOA as a kind of "global constraint" (one other topic to discuss with interested persons). -
3-objective problems: we have in our archives all the original codes for computing $Y_N$ and $X_{Em}$ for instances of 3-objective linear assignment problems.
Almost all instances are extensions of single objective instances available on the OR-Library. It is the reason why the format is not standardized and mainly follow the original description provided in the OR-Library. Also the MCDM community who is the main user of these instances was not in favor of using others format/descriptors(as json). Reasons were (1) having files readable without any specific software and (2) having files size smaller as possible. I was a long time from now (I initated it in 1998...), adopting "modern" formats would be more acceptable by users in 2025 (I have a talk in next October on MOP software at the RAMOO meeting, I will probe the audience).
Today, when I come back on these instances, I provide online the parser in julia e.g. for 2-partitioning instances.
I am very interested to follow this discussion; and if they can help the MO-JuMP community, I am ready to dedicate time and to unarchive our ton of files and codes.
I started looking into this in #115
One thing I came across is that there might be multiple points in X space for the same Y:
https://github.com/vOptSolver/vOptLib/blob/16ecd822af03c3590db44026a4e4160c31d2a996/UKP/X/2KP100-50.max#L24-L25
Great!
Indeed, for each non-dominated point $y' \in Y_N$ we are talking about the set of efficient solutions in term of
- the maximum complete set $X_{E_M}$: contains all solutions $x \in X$ corresponding to $y'$ (all equivalent solutions)
- a complete set $X_E$: contains at least one solution $x \in X$ corresponding to $y'$ (some equivalent solutions)
- a minimum complete set $X_{E_m}$: contains only one solution $x \in X$ corresponding to $y'$ (without equivalent solution)
ps: This distinction is important for characterising properly a given algorithm (maintaining solutions is costly for several reasons). Unfortunately papers are not rigourous on this, making a comparison of algorithms not straightforward.
So far there are only objective space algorithms and I do not know how likely it is to add multi-objective branch and bound etc.
The algorithms update the search whenever a new nondominated point is found and I believe in all implementations we have something like y = compute_point(f, x) so I think for every nondominated point there will be an efficient solution. It should suffice to check for the equivelance of Y_N in my opinion.
Personally, I care only about the minimum complete set. We don't benchmark LP solvers against their ability to find multiple solutions.
I've added the UKP to: https://github.com/jump-dev/MultiObjectiveAlgorithms.jl/tree/master/instances
Model
The input is a MathOptFormat model.
Here's a 2-variable knapsack example:
{
"name": "MathOptFormat Model",
"version": {"major": 1, "minor": 7},
"variables": [{"name": "x[1]"}, {"name": "x[2]"}],
"objective": {
"sense": "max",
"function": {
"type": "VectorAffineFunction",
"terms": [{
"output_index": 1,
"scalar_term": {"coefficient": 80.0, "variable": "x[1]"}
}, {
"output_index": 1,
"scalar_term": {"coefficient": 59.0, "variable": "x[2]"}
}, {
"output_index": 2,
"scalar_term": {"coefficient": 62.0, "variable": "x[1]"}
}, {
"output_index": 2,
"scalar_term": {"coefficient": 80.0, "variable": "x[2]"}
}],
"constants":[0.0,0.0]
}
},
"constraints":[{
"function": {
"type": "ScalarAffineFunction",
"terms": [
{"coefficient": 25.0, "variable": "x[1]"},
{"coefficient": 26.0, "variable": "x[2]"}
],
"constant":0.0
},
"set": {"type": "LessThan", "upper": 187.0}
}, {
"function": {"type": "Variable", "name": "x[1]"},
"set": {"type": "ZeroOne"}
}, {
"function": {"type": "Variable", "name": "x[2]"},
"set": {"type": "ZeroOne"}
}]
}
Solution
The output is a finite list of (X, Y) solutions: https://github.com/jump-dev/MultiObjectiveAlgorithms.jl/tree/master/instances/solutions
The current solution schema is:
{
"$schema": "https://json-schema.org/schema#",
"$id": "https://jump.dev/MultiObjectiveAlgorithms/schemas/solution.schema.json",
"title": "The solution schema for MultiObjectiveAlgorithms",
"type": "array",
"items": {
"type": "object",
"required": ["X", "Y"],
"properties": {
"X": {
"type": "array",
"items": {"type": "number"}
},
"Y": {
"type": "array",
"items": {"type": "number"}
}
}
}
}
where X is a vector in the same order as variables in the .mof.json:
[
{"X": [1, 0], "Y": [1.0, 2.0]},
{"X": [0, 1], "Y": [1.0, 2.0]}
]
Another option would be to use explicit names:
[
{"X": {"x[1]": 1, "x[2]": 0}, "Y": [1.0, 2.0]},
{"X": {"x[1]": 0, "x[2]": 1}, "Y": [1.0, 2.0]}
]
{
"$schema": "https://json-schema.org/schema#",
"$id": "https://jump.dev/MultiObjectiveAlgorithms/schemas/solution.schema.json",
"title": "The solution schema for MultiObjectiveAlgorithms",
"type": "array",
"items": {
"type": "object",
"required": ["X", "Y"],
"properties": {
"X": {
"type": "object",
},
"Y": {
"type": "array",
"items": {"type": "number"}
}
}
}
}
or, if we think solutions are often very sparse. To assume a default 0.0:
[
{"X": {"x[1]": 1}, "Y": [1.0, 2.0]},
{"X": {"x[2]": 1}, "Y": [1.0, 2.0]}
]
Another option would be to use explicit names:
I think this is more appropriate. We might have different variables with different names and they can be mixed up when you put them all together in a matrix like format.
if we think solutions are often very sparse. To assume a default 0.0:
I think this will be true for most problems so I agree.
Some remarks:
1)
To base tests to the minimum set of non-dominated points is a reasonable choice for algorithms aiming to compute $Y_N$.
For the Dichotomic algorithm, a similar question arises (to consider only extreme non-dominated points knowing that non extreme non-dominated points may exist and may be obtained by the algorithm). To base tests on the minimum set of extreme non-dominated points $Y_{SN_{extreme}}$ seems reasonable too.
2)
Files describing a model and a result are on a single line (carriage returns are missing).
3)
This MathOptFormat is great for describing an instance corresponding to a generic problem (a model without a structure a priori). It reports everything in detail. For specific problems (a model with a structure, and simply defined by the name of the problem), this format comes with a lot of extra information not necessary in fact.
Furthermore, instances e.g. packing/covering/partitioning are often large (some are huge on the OR-library, see rail4284 a single objective instance with 4284 rows and 1,092,610 columns => 5.5M) even with a sparse description as it is on the OR-Library (and adopted in vOptLib). I wonder what will be the size of such an instance with all additional characters accompanying the json description.
Same though for the result file. The number of non-dominated points may be very large, and for problems with a significant number of variables (500...10000), the description of the efficient solution set looks like a huge 01 matrix. Here again the size of the corresponding json file will be impressive with all the added characters.
4)
Having a sort of traceability of an instance (as Beasley did for the OR-library, and I tried to follow this) is something important from my point of view, especially for papers who discus about numerical performances of algorithms using instances found online. Some instances that we provided online with a given name, have been used on others places (example of tri-objective linear assignment instances of G. Kirkik used and refered by N. Boland but these instances come from the PhD thesis of Przybylski; same for bi-objective set covering instances of T. Lust in 2017 are instances provided online by us in 1998 and avalible in vOptLib). In the best case, the origin is mentioned, until the worst case where instances are renamed making hard any tracing to the origin.
It will be valuable for a new contributor to the field to find such a trace to the origin with instances. Two informations seems relevant for this: (1) the url of the original repository and/or (2) the reference to the talk/paper where these instances have been considered for the first time. Having such trace, it will be helpful for following progress on considered algorithms and solutions. I was wondering if the field 'author" aims to carry such info (the source i.e. (1) a reference and (2) a URL) in the MathOptFormat?
For the Dichotomic algorithm, a similar question arises (to consider only extreme non-dominated points knowing that non extreme non-dominated points may exist and may be obtained by the algorithm). To base tests on the minimum set of extreme non-dominated points seems reasonable too.
Agree that this is a very open question with no clear answer.
are on a single line (carriage returns are missing).
They can be read with JSON.parsefile(filename). New lines are optional, although I agree the current reduces human readability.
For specific problems (a model with a structure, and simply defined by the name of the problem), this format comes with a lot of extra information not necessary in fact.
I see this as a positive. I'd like people to come up with generic solution algorithms instead of problem-specific ones, or for us to detect problem-specific structure in presolve like MIP solvers.
I wonder what will be the size of such an instance with all additional characters accompanying the json description
Files can be trivially compressed as .mof.json.gz files (write_to_file(model, "filename.mof.json.gz")). We updated all the MIPLIB models to it without problem. That file might be 55 mb instead of 5.5 mb, but I don't see that as an issue. Storage is cheap.
Here again the size of the corresponding json file will be impressive with all the added characters.
This is somewhat an argument instead of the current vector solution instead of object name: value. But see also the compression.
Having such trace, it will be helpful for following progress on considered algorithms and solutions. I was wondering if the field 'author" aims to carry such info (the source i.e. (1) a reference and (2) a URL) in the MathOptFormat?
Yes, we could easily add this, and it's what we added those fields to MathOptFormat for. Once fixed, I see this as another positive, since every model would carry around citation information, so we don't rely on a README or similar in a nearby directory.