Cbc icon indicating copy to clipboard operation
Cbc copied to clipboard

Crash depending on order of ROWS in MPS File

Open chriswasser opened this issue 2 years ago • 3 comments

Hi everyone,

While playing around with CBC through the interface of the PuLP package, I noticed some unexpected behaviour when CBC receives an MPS input file. Since PuLP does not enforce any specific ordering on the constraints in the ROWS section (or any other section for that matter) of the generated MPS file, different MPS files are produced for the same instance. In this issue, I have attached two instances of a MIP scheduling problem (*.txt files need to be renamed back to *.mps), which work as minimal reproducers for the observed issue:

They are both feasible instances (as confirmed by Gurobi and SCIP). The only difference between these files: Three lines in the ROWS section are swapped

$ diff --unified correct-result.mps incorrect-result.mps 
--- correct-result.mps  2022-09-28 16:20:00.479303154 +0200
+++ incorrect-result.mps        2022-09-28 12:31:07.693920473 +0200
@@ -1972,9 +1972,6 @@
  E  MustScheduleJobSomewhereConstraint_ueIOJ2gMjs
  E  MustScheduleJobSomewhereConstraint_hpGPSGljDO
  E  MustScheduleJobSomewhereConstraint_2WuMzKPxKJ
- L  DataTransferForDependencyConstraint_tMQYOOXFMk_Z7IsibK9ce_Node1CPU1Socket1_Node2CPU1Socket1
- L  DataTransferForDependencyConstraint_tMQYOOXFMk_Z7IsibK9ce_Node1CPU1Socket1_Node2CPU1Socket2
- L  DataTransferForDependencyConstraint_tMQYOOXFMk_Z7IsibK9ce_Node1CPU1Socket2_Node1CPU1Socket1
  L  DataTransferForDependencyConstraint_AP8tLyAnxt_oj8kVdk68o_Node1CPU1Socket1_Node1CPU1Socket2
  L  DataTransferForDependencyConstraint_AP8tLyAnxt_oj8kVdk68o_Node1CPU1Socket1_Node2CPU1Socket1
  L  DataTransferForDependencyConstraint_AP8tLyAnxt_oj8kVdk68o_Node1CPU1Socket1_Node2CPU1Socket2
@@ -2024,6 +2021,9 @@
  L  DataTransferForDependencyConstraint_oj8kVdk68o_XKN9FnoZHH_Node2CPU1Socket2_Node1CPU1Socket2
  L  DataTransferForDependencyConstraint_oj8kVdk68o_XKN9FnoZHH_Node2CPU1Socket2_Node2CPU1Socket1
  L  DataTransferForDependencyConstraint_tMQYOOXFMk_Z7IsibK9ce_Node1CPU1Socket1_Node1CPU1Socket2
+ L  DataTransferForDependencyConstraint_tMQYOOXFMk_Z7IsibK9ce_Node1CPU1Socket1_Node2CPU1Socket1
+ L  DataTransferForDependencyConstraint_tMQYOOXFMk_Z7IsibK9ce_Node1CPU1Socket1_Node2CPU1Socket2
+ L  DataTransferForDependencyConstraint_tMQYOOXFMk_Z7IsibK9ce_Node1CPU1Socket2_Node1CPU1Socket1
  L  DataTransferForDependencyConstraint_tMQYOOXFMk_Z7IsibK9ce_Node1CPU1Socket2_Node2CPU1Socket1
  L  DataTransferForDependencyConstraint_tMQYOOXFMk_Z7IsibK9ce_Node1CPU1Socket2_Node2CPU1Socket2
  L  DataTransferForDependencyConstraint_tMQYOOXFMk_Z7IsibK9ce_Node2CPU1Socket1_Node1CPU1Socket1

CBC is able to correctly solve the first instance correct-result.mps while it crashes with a segmentation fault for the second instance incorrect-result.mps. The full output for both executions with CBC version 2.10 is given below:

$ cbc correct-result.mps    
Welcome to the CBC MILP Solver 
Version: 2.10 
Build Date: Aug 27 2022 

command line - cbc correct-result.mps (default strategy 1)
At line 2 NAME          CompactModel
At line 3 ROWS
At line 2459 COLUMNS
At line 12617 RHS
At line 15072 BOUNDS
At line 15582 ENDATA
Problem CompactModel has 2454 rows, 541 columns and 9140 elements
Coin0008I CompactModel read with 0 errors
Continuous objective value is 0.029523 - 0.02 seconds
Cgl0003I 0 fixed, 0 tightened bounds, 336 strengthened rows, 128 substitutions
Cgl0003I 0 fixed, 0 tightened bounds, 336 strengthened rows, 0 substitutions
Cgl0003I 0 fixed, 0 tightened bounds, 331 strengthened rows, 0 substitutions
Cgl0003I 0 fixed, 0 tightened bounds, 335 strengthened rows, 0 substitutions
Cgl0003I 0 fixed, 0 tightened bounds, 174 strengthened rows, 0 substitutions
Cgl0003I 0 fixed, 0 tightened bounds, 168 strengthened rows, 0 substitutions
Cgl0003I 0 fixed, 0 tightened bounds, 168 strengthened rows, 0 substitutions
Cgl0004I processed model has 2306 rows, 477 columns (444 integer (444 of which binary)) and 10608 elements
Cbc0038I Initial state - 32 integers unsatisfied sum - 16
Cbc0038I Pass   1: suminf.   14.50000 (47) obj. 0.029523 iterations 88
Cbc0038I Pass   2: suminf.   14.14286 (61) obj. 0.0295235 iterations 30
Cbc0038I Pass   3: suminf.   13.83333 (58) obj. 0.0295243 iterations 18
Cbc0038I Pass   4: suminf.   13.50000 (57) obj. 0.0295277 iterations 38
Cbc0038I Pass   5: suminf.   13.23077 (61) obj. 0.0295262 iterations 21
Cbc0038I Pass   6: suminf.   13.23077 (61) obj. 0.0295262 iterations 42
Cbc0038I Pass   7: suminf.   13.23077 (61) obj. 0.0295262 iterations 31
Cbc0038I Pass   8: suminf.   12.88889 (57) obj. 0.029526 iterations 15
Cbc0038I Pass   9: suminf.   14.11111 (58) obj. 0.0295259 iterations 55
Cbc0038I Pass  10: suminf.   13.11111 (57) obj. 0.0295254 iterations 29
Cbc0038I Pass  11: suminf.   12.44444 (57) obj. 0.029526 iterations 15
Cbc0038I Pass  12: suminf.   14.71429 (56) obj. 0.0295269 iterations 95
Cbc0038I Pass  13: suminf.   13.43750 (60) obj. 0.0295264 iterations 60
Cbc0038I Pass  14: suminf.   13.18750 (62) obj. 0.0295264 iterations 21
Cbc0038I Pass  15: suminf.   13.83135 (30) obj. 0.0295294 iterations 154
Cbc0038I Pass  16: suminf.   13.68133 (63) obj. 0.0295293 iterations 131
Cbc0038I Pass  17: suminf.   12.74310 (58) obj. 0.0295291 iterations 45
Cbc0038I Pass  18: suminf.   13.83135 (30) obj. 0.0295294 iterations 102
Cbc0038I Pass  19: suminf.   13.83135 (30) obj. 0.0295294 iterations 68
Cbc0038I Pass  20: suminf.   13.48133 (61) obj. 0.0295308 iterations 132
Cbc0038I Pass  21: suminf.   13.08133 (62) obj. 0.0295308 iterations 31
Cbc0038I Pass  22: suminf.   13.01883 (56) obj. 0.0295299 iterations 39
Cbc0038I Pass  23: suminf.   13.83135 (30) obj. 0.0295294 iterations 114
Cbc0038I Pass  24: suminf.   13.68133 (60) obj. 0.0299313 iterations 98
Cbc0038I Pass  25: suminf.   12.51883 (58) obj. 0.0295264 iterations 37
Cbc0038I Pass  26: suminf.   12.33133 (54) obj. 0.029526 iterations 29
Cbc0038I Pass  27: suminf.   14.90000 (49) obj. 0.0295261 iterations 79
Cbc0038I Pass  28: suminf.   12.93750 (58) obj. 0.0295264 iterations 64
Cbc0038I Pass  29: suminf.   12.74310 (59) obj. 0.0295263 iterations 25
Cbc0038I Pass  30: suminf.   13.83135 (30) obj. 0.0295294 iterations 89
Cbc0038I No solution found this major pass
Cbc0038I Before mini branch and bound, 346 integers at bound fixed and 6 continuous
Cbc0038I Full problem 2306 rows 477 columns, reduced to 964 rows 121 columns
Cbc0038I Mini branch and bound did not improve solution (1.54 seconds)
Cbc0038I After 1.54 seconds - Feasibility pump exiting - took 0.75 seconds
Cbc0031I 23 added rows had average density of 50.26087
Cbc0013I At root node, 23 cuts changed objective from 0.02952299 to 0.02952299 in 100 passes
Cbc0014I Cut generator 0 (Probing) - 26 row cuts average 2.2 elements, 0 column cuts (0 active)  in 3.132 seconds - new frequency is -100
Cbc0014I Cut generator 1 (Gomory) - 1065 row cuts average 223.9 elements, 0 column cuts (0 active)  in 0.867 seconds - new frequency is 1
Cbc0014I Cut generator 2 (Knapsack) - 1 row cuts average 21.0 elements, 0 column cuts (0 active)  in 0.149 seconds - new frequency is -100
Cbc0014I Cut generator 3 (Clique) - 0 row cuts average 0.0 elements, 0 column cuts (0 active)  in 0.022 seconds - new frequency is -100
Cbc0014I Cut generator 4 (MixedIntegerRounding2) - 0 row cuts average 0.0 elements, 0 column cuts (0 active)  in 0.494 seconds - new frequency is -100
Cbc0014I Cut generator 5 (FlowCover) - 0 row cuts average 0.0 elements, 0 column cuts (0 active)  in 0.097 seconds - new frequency is -100
Cbc0014I Cut generator 6 (TwoMirCuts) - 109 row cuts average 20.4 elements, 0 column cuts (0 active)  in 0.162 seconds - new frequency is -100
Cbc0010I After 0 nodes, 1 on tree, 1e+50 best solution, best possible 0.02952299 (10.78 seconds)
Cbc0010I After 100 nodes, 52 on tree, 1e+50 best solution, best possible 0.02952299 (22.08 seconds)
Cbc0016I Integer solution of 0.029523161 found by strong branching after 13100 iterations and 147 nodes (23.54 seconds)
Cbc0001I Search completed - best objective 0.02952316135112162, took 13100 iterations and 147 nodes (23.55 seconds)
Cbc0032I Strong branching done 2462 times (31309 iterations), fathomed 1 nodes and fixed 95 variables
Cbc0035I Maximum depth 29, 0 variables fixed on reduced cost
Cuts at root node changed objective from 0.029523 to 0.029523
Probing was tried 100 times and created 26 cuts of which 0 were active after adding rounds of cuts (3.132 seconds)
Gomory was tried 139 times and created 1099 cuts of which 0 were active after adding rounds of cuts (0.999 seconds)
Knapsack was tried 100 times and created 1 cuts of which 0 were active after adding rounds of cuts (0.149 seconds)
Clique was tried 100 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.022 seconds)
MixedIntegerRounding2 was tried 100 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.494 seconds)
FlowCover was tried 100 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.097 seconds)
TwoMirCuts was tried 100 times and created 109 cuts of which 0 were active after adding rounds of cuts (0.162 seconds)
ZeroHalf was tried 1 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.013 seconds)

Result - Optimal solution found

Objective value:                0.02952316
Enumerated nodes:               147
Total iterations:               13100
Time (CPU seconds):             23.62
Time (Wallclock seconds):       23.76

Total time (CPU seconds):       23.66   (Wallclock seconds):       23.82

$ cbc incorrect-result.mps
Welcome to the CBC MILP Solver 
Version: 2.10 
Build Date: Aug 27 2022 

command line - cbc incorrect-result.mps (default strategy 1)
At line 2 NAME          CompactModel
At line 3 ROWS
At line 2459 COLUMNS
At line 12617 RHS
At line 15072 BOUNDS
At line 15582 ENDATA
Problem CompactModel has 2454 rows, 541 columns and 9140 elements
Coin0008I CompactModel read with 0 errors
Continuous objective value is 0.029523 - 0.03 seconds
Cgl0003I 0 fixed, 0 tightened bounds, 336 strengthened rows, 128 substitutions
Cgl0003I 0 fixed, 0 tightened bounds, 336 strengthened rows, 0 substitutions
Cgl0003I 0 fixed, 0 tightened bounds, 331 strengthened rows, 0 substitutions
Cgl0003I 0 fixed, 0 tightened bounds, 335 strengthened rows, 0 substitutions
Cgl0003I 0 fixed, 0 tightened bounds, 174 strengthened rows, 0 substitutions
Cgl0003I 0 fixed, 0 tightened bounds, 168 strengthened rows, 0 substitutions
Cgl0003I 0 fixed, 0 tightened bounds, 168 strengthened rows, 0 substitutions
Cgl0004I processed model has 2306 rows, 477 columns (444 integer (444 of which binary)) and 10608 elements
Cbc0038I Initial state - 32 integers unsatisfied sum - 16
Cbc0038I Pass   1: suminf.   14.50000 (47) obj. 0.029523 iterations 90
Cbc0038I Pass   2: suminf.   14.14286 (61) obj. 0.0295235 iterations 28
Cbc0038I Pass   3: suminf.   13.25000 (52) obj. 0.029523 iterations 53
Cbc0038I Pass   4: suminf.   12.75000 (51) obj. 0.029523 iterations 7
Cbc0038I Pass   5: suminf.   16.00000 (42) obj. 0.029523 iterations 113
Cbc0038I Pass   6: suminf.   13.75000 (56) obj. 0.029523 iterations 68
Cbc0038I Pass   7: suminf.   13.50000 (49) obj. 0.029523 iterations 17
Cbc0038I Pass   8: suminf.   15.11111 (36) obj. 0.029523 iterations 102
Cbc0038I Pass   9: suminf.   13.60000 (57) obj. 0.029523 iterations 75
Cbc0038I Pass  10: suminf.   13.00000 (52) obj. 0.029523 iterations 24
Cbc0038I Pass  11: suminf.   14.41667 (53) obj. 0.0295236 iterations 71
Cbc0038I Pass  12: suminf.   13.50000 (51) obj. 0.029523 iterations 23
Cbc0038I Pass  13: suminf.   16.53866 (49) obj. 0.029523 iterations 90
Cbc0038I Pass  14: suminf.   13.71128 (58) obj. 0.029523 iterations 60
Cbc0038I Pass  15: suminf.   13.24699 (54) obj. 0.029523 iterations 41
Cbc0038I Pass  16: suminf.   16.50000 (35) obj. 0.029523 iterations 103
Cbc0038I Pass  17: suminf.   16.00000 (32) obj. 0.029523 iterations 78
Cbc0038I Pass  18: suminf.   13.80000 (55) obj. 0.029523 iterations 85
Cbc0038I Pass  19: suminf.   13.00000 (51) obj. 0.029523 iterations 30
Cbc0038I Pass  20: suminf.   12.50000 (50) obj. 0.029523 iterations 12
Cbc0038I Pass  21: suminf.   16.00000 (32) obj. 0.029523 iterations 112
Cbc0038I Pass  22: suminf.   13.50000 (48) obj. 0.029523 iterations 77
Cbc0038I Pass  23: suminf.   11.50000 (45) obj. 0.0295231 iterations 43
Cbc0038I Pass  24: suminf.   15.00000 (30) obj. 0.0295249 iterations 135
Cbc0038I Pass  25: suminf.   13.68750 (57) obj. 0.0295259 iterations 135
Cbc0038I Pass  26: suminf.   13.05000 (55) obj. 0.0295259 iterations 43
Cbc0038I Pass  27: suminf.   15.00000 (37) obj. 0.0295253 iterations 109
Cbc0038I Pass  28: suminf.   13.89286 (52) obj. 0.0295262 iterations 62
Cbc0038I Pass  29: suminf.   13.05000 (55) obj. 0.0295259 iterations 41
Cbc0038I Pass  30: suminf.   13.05000 (55) obj. 0.0295259 iterations 23
Cbc0038I No solution found this major pass
Cbc0038I Before mini branch and bound, 356 integers at bound fixed and 4 continuous
Cbc0038I Full problem 2306 rows 477 columns, reduced to 885 rows 112 columns
Cbc0038I Mini branch and bound did not improve solution (1.26 seconds)
Cbc0038I After 1.26 seconds - Feasibility pump exiting - took 0.57 seconds
Cbc0031I 23 added rows had average density of 84.130435
Cbc0013I At root node, 23 cuts changed objective from 0.02952299 to 0.02952299 in 100 passes
Cbc0014I Cut generator 0 (Probing) - 26 row cuts average 2.0 elements, 0 column cuts (0 active)  in 2.818 seconds - new frequency is -100
Cbc0014I Cut generator 1 (Gomory) - 1049 row cuts average 258.2 elements, 0 column cuts (0 active)  in 0.938 seconds - new frequency is 1
Cbc0014I Cut generator 2 (Knapsack) - 2 row cuts average 5.0 elements, 0 column cuts (0 active)  in 0.135 seconds - new frequency is -100
Cbc0014I Cut generator 3 (Clique) - 0 row cuts average 0.0 elements, 0 column cuts (0 active)  in 0.021 seconds - new frequency is -100
Cbc0014I Cut generator 4 (MixedIntegerRounding2) - 1 row cuts average 9.0 elements, 0 column cuts (0 active)  in 0.555 seconds - new frequency is -100
Cbc0014I Cut generator 5 (FlowCover) - 0 row cuts average 0.0 elements, 0 column cuts (0 active)  in 0.104 seconds - new frequency is -100
Cbc0014I Cut generator 6 (TwoMirCuts) - 110 row cuts average 20.2 elements, 0 column cuts (0 active)  in 0.166 seconds - new frequency is -100
Cbc0010I After 0 nodes, 1 on tree, 1e+50 best solution, best possible 0.02952299 (10.61 seconds)
cbc: Clp/Clp/src/ClpSimplexDual.cpp:7216: int ClpSimplexDual::fastDual(bool): Assertion `problemStatus_ < 0' failed.
zsh: abort (core dumped)  cbc incorrect-result.mps

As far as I could tell from the online documentation of the MPS file format here, the order of constraints in the ROWS section should not matter:

The NAME card can have anything you want, starting in column 15.  The
ROWS section defines the names of all the constraints; entries in
column 2 or 3 are E for equality rows, L for less-than ( <= ) rows, G
for greater-than ( >= ) rows, and N for non-constraining rows (the
first of which would be interpreted as the objective function).  The
order of the rows named in this section is unimportant.

However, as there does not seem to be a formal specification of the MPS file format, I am unsure whether CBC follows this specification as well. Is this behaviour a bug in CBC or actually expected? If the constraints in the ROWS section need to be ordered, in which way do they need to be arranged such that tools like PuLP can respect this?

I am happy to provide whatever information is missing. Thanks for any feedback on this issue! I look forward hearing from you 🤓

Greetings

Christian

chriswasser avatar Sep 28 '22 15:09 chriswasser

While stressing the code on master branch, I managed to get that assert. I convinced myself that the assert was incorrect and took out the assert. The problems I was working with were much bigger and I will investigate this further before modifying stable code. In my case it was because of over zealous perturbing the objective (which is done to avoid degeneracy). Changing the order of rows would change the perturbation from the pseudo random number generator.

jjhforrest avatar Sep 28 '22 16:09 jjhforrest

Interesting, thanks for confirming the observations so quickly and already assessing potential issues in the code 👍

Looking forward to the implemented improvements

chriswasser avatar Sep 28 '22 17:09 chriswasser

The assert should definitely go. The code involved is fastDual which tries to go faster and does take a few shortcuts. It is being used here when the code is doing strong branching i.e. doing some iterations on more than one candidate for branching. Without the assert the code will return a status of 3 which means stopped on iterations. This is normal when doing strong branching and means the code will take some notice of the result, but will not totally trust it e.g. for fixing variables. So in this case it is the correct thing to take out assert. The case I was looking at was a bit more complicated - but anyway I am taking out assert in stable.

jjhforrest avatar Sep 28 '22 18:09 jjhforrest

Sry for the long delay. I can confirm that this change indeed fixes the described problem 👍 Thank you 👌

chriswasser avatar Nov 07 '22 16:11 chriswasser