Cbc icon indicating copy to clipboard operation
Cbc copied to clipboard

Segmentation fault on parallel CBC

Open gabrielhomsi opened this issue 2 years ago • 0 comments

I am trying to solve some MIP models with parallel CBC on Linux, but sometimes I experience segmentation faults on CBC. I am running experiments with 10 and 30 threads on a Linux cloud environment (Compute Canada, single node, CentOS Linux release 7.9.2009 (Core)).

I compile CBC as below:

wget https://raw.githubusercontent.com/coin-or/coinbrew/master/coinbrew
chmod u+x coinbrew
./coinbrew build Cbc@stable/2.10 --enable-cbc-parallel

The segmentation faults are quite rare. They happen once or twice over 400 runs. Each run is for a different model, and the issue does not seem to be unique to a specific model file. Also, the issue is not deterministic: I can attempt to solve the same model again, and it is likely that CBC will run without any issues. Is this a known issue or is there any way to avoid these errors?

Below is the error message that CBC prints when this error happens:

ERROR: Solver (cbc) returned non-zero return code (-11)
ERROR: Solver log: Welcome to the CBC MILP Solver Version: 2.10 Build Date:
May 27 2022

command line - /localscratch/homgab.29135046.0/CBC/bin/cbc -seconds 300
-threads 10 -ratioGap 0.0 -printingOptions all -import
/localscratch/homgab.29135046.0/tmp3ax2v4ht.pyomo.lp -stat=1 -solve -solu
/localscratch/homgab.29135046.0/tmp3ax2v4ht.pyomo.soln (default strategy
1) seconds was changed from 1e+100 to 300 threads was changed from 0 to 10
ratioGap was changed from 0 to 0 Option for printingOptions changed from
normal to all Presolve 1104 (-492) rows, 1268 (-85) columns and 2900
(-609) elements Statistics for presolved model Original problem has 483
integers (467 of which binary) Presolved problem has 432 integers (416 of
which binary) ==== 400 zero objective 380 different ==== absolute
objective values 380 different ==== for integers 0 zero objective 209
different ==== for integers absolute objective values 209 different =====
end objective counts


Problem has 1104 rows, 1268 columns (868 with objective) and 2900 elements
There are 52 singletons with objective Column breakdown: 816 of type
0.0->inf, 36 of type 0.0->up, 0 of type lo->inf, 0 of type lo->up, 0 of
type free, 0 of type fixed, 0 of type -inf->0.0, 0 of type -inf->up, 416
of type 0.0->1.0 Row breakdown: 72 of type E 0.0, 0 of type E 1.0, 0 of
type E -1.0, 0 of type E other, 0 of type G 0.0, 0 of type G 1.0, 0 of
type G other, 832 of type L 0.0, 200 of type L 1.0, 0 of type L other, 0
of type Range 0.0->1.0, 0 of type Range other, 0 of type Free Continuous
objective value is 1648.9 - 0.01 seconds Cgl0004I processed model has 1104
rows, 1267 columns (432 integer (416 of which binary)) and 2899 elements
Cbc0038I Initial state - 17 integers unsatisfied sum - 0.859312 Cbc0038I
Pass   1: suminf.    0.00000 (0) obj. 1681.82 iterations 32 Cbc0038I
Solution found of 1681.82 Cbc0038I Relaxing continuous gives 1681.82
Cbc0038I Cleaned solution of 1681.82 Cbc0038I Before mini branch and
bound, 415 integers at bound fixed and 803 continuous Cbc0038I Full
problem 1104 rows 1267 columns, reduced to 32 rows 33 columns Cbc0038I
Mini branch and bound did not improve solution (0.03 seconds) Cbc0038I
Round again with cutoff of 1678.52 Cbc0038I Reduced cost fixing fixed 4
variables on major pass 2 Cbc0038I Pass   2: suminf.    0.50267 (10) obj.
1678.52 iterations 206 Cbc0038I Pass   3: suminf.    0.44559 (2) obj.
1678.52 iterations 174 Cbc0038I Pass   4: suminf.    0.30530 (2) obj.
1678.52 iterations 69 Cbc0038I Pass   5: suminf.    1.16640 (15) obj.
1678.52 iterations 196 Cbc0038I Pass   6: suminf.    1.06017 (24) obj.
1678.52 iterations 31 Cbc0038I Pass   7: suminf.    1.00892 (13) obj.
1678.52 iterations 243 Cbc0038I Pass   8: suminf.    1.11901 (17) obj.
1678.52 iterations 60 Cbc0038I Pass   9: suminf.    0.99543 (13) obj.
1678.52 iterations 53 Cbc0038I Pass  10: suminf.    1.07353 (15) obj.
1678.52 iterations 60 Cbc0038I Pass  11: suminf.    0.97374 (15) obj.
1678.52 iterations 49 Cbc0038I Pass  12: suminf.    1.04035 (14) obj.
1678.52 iterations 44 Cbc0038I Pass  13: suminf.    1.30813 (18) obj.
1678.52 iterations 95 Cbc0038I Pass  14: suminf.    0.86364 (19) obj.
1678.52 iterations 66 Cbc0038I Pass  15: suminf.    0.77387 (12) obj.
1678.52 iterations 56 Cbc0038I Pass  16: suminf.    0.77379 (26) obj.
1678.52 iterations 20 Cbc0038I Pass  17: suminf.    0.77387 (12) obj.
1678.52 iterations 11 Cbc0038I Pass  18: suminf.    0.99800 (19) obj.
1678.52 iterations 66 Cbc0038I Pass  19: suminf.    0.84212 (31) obj.
1678.52 iterations 60 Cbc0038I Pass  20: suminf.    1.04384 (14) obj.
1678.52 iterations 31 Cbc0038I Pass  21: suminf.    1.12062 (17) obj.
1678.52 iterations 62 Cbc0038I Pass  22: suminf.    0.89025 (14) obj.
1678.52 iterations 58 Cbc0038I Pass  23: suminf.    1.00946 (16) obj.
1678.52 iterations 39 Cbc0038I Pass  24: suminf.    0.89025 (14) obj.
1678.52 iterations 51 Cbc0038I Pass  25: suminf.    0.95480 (15) obj.
1678.52 iterations 62 Cbc0038I Pass  26: suminf.    0.92722 (17) obj.
1678.52 iterations 63 Cbc0038I Pass  27: suminf.    0.96872 (14) obj.
1678.52 iterations 49 Cbc0038I

gabrielhomsi avatar Jun 07 '22 14:06 gabrielhomsi