SYMPHONY Takes longer to solve in new version primary due to items now not finding a solution in presolve

Forwarded from our development team: While upgrading from Symphony version 5.6.14 to version 5.6.17 we are seeing some strange behavior on both our linux and windows builds. The examples below were generated using the windows build but we are seeing similar behavior on linux as well. We have a small test case with 2,278 rows and also 2,278 constraints (LPT file attached in scenario.zip).

On version 5.6.14, this case solves almost immediately. On version 5.6.17, the case takes 2 to 2.5 minutes to solve. The output logs with verbosity level = 50 are attached. The default settings are being used (do_primal_heuristic=1, prep_level=5) and we have first feasible activated.

The runs are similar until it gets to the "Now displaying final relaxed solution..." message. The old version finds the solution right away using heuristics while the new version goes into iterations. So far, only this test scenario is showing the issue. -Bill

Nov 09 '20 20:11 xnaegamvala

5.6.17.new.txt 5.6.17.new.txt scenario.zip

Nov 09 '20 20:11 xnaegamvala

Unfortunately, I can't replicate this. With the parameter file

do_primal_heuristic 1
prep_level 5
find_first_feasible 1

which I understood to be the one you're using, I got an answer back almost instantly in both cases. Solving to provable optimality take substantially longer, but the gap is nearly zero in the root node, so there doesn't seem to be any need to go further. My output looks like this.

Preprocessing finished...
 	 constraints removed: 123
Problem has 
	 1180 constraints 
	 2278 variables 
	 119277 nonzero coefficients

Total Presolve Time: 0.137313...

Solving...

granularity set at 0.000000
[New LWP 13023]
[New LWP 13024]
[New LWP 13025]
[New LWP 13026]
[New LWP 13027]
[New LWP 13028]
[New LWP 13029]
Thread 3 now processing node 0
****************************************************
* Now processing NODE 0 LEVEL 0 (from TM)
****************************************************



**** Starting iteration 1 ****

solving root lp relaxation
The LP value is: -349573521.608 [0,1296]

SH-FEAS: 0 --- -349553186.045000 
LS-FEAS: 0 --- -349553186.045000 
****** Found Better Feasible Solution !
****** After Calling Heuristics !
****** Cost: -349566986.753000

fathoming node (no more cols to check)


****************************************************
* Stopping After Finding First Feasible Solution   *
* Now displaying stats and best solution found...  *
****************************************************

My pre-solve time is substantially faster than yours, which is strange unless you are using a very old processor. I did set the prep level to 5, as you said you did, and I got the same number of constraints and variables but a different number of non-zeros. So it seems we are not solving quite the same problem. Maybe that is a clue?

You seem to be using your own driver, so I can't really be sure if there could be differences there. Could you try with the SYMPHONY executable that is automatically built and see if you have the same result? I was wondering if it might have something to do with running multi-threaded so you could also try building without OpenMP support and see if that makes a difference. If you could tell me exactly how you built SYMPHONY, that would help (what arguments to configure, etc.).

I looked over the diff between the two versions and there really isn't any change that would have affected the solve time, so there's not a lot for me to go on. I did actually find a bug related to I found a bug related to the printing of the lower bound, but it shouldn't affect anything here.

So I'm kind of at a dead end unless you can point me to some more clues or information.

Nov 13 '20 04:11 tkralphs

Hi Ted,

I am one of the developers working with xnaegamvala. Following up a bit on the above.

Does "using your own driver" just mean that we are using our own wrapper to load our own binary file ? We generate the LP file right after we load the problem from our own file so I suppose the LP file generated might be a for a slightly different problem.

Our linux build (on a centos environment) uses the coinbrew utility to build the shared objects - the wrapper executable dynamically loads these shared objects. On windows, we are just using the provided visual studio projects (VS2015) to build the various symphony libraries that are statically linked into our wrapper executable.

In my local testing of the issue on windows, I forced the number of threads to 1 (sym_int_TM_max_active_nodes=1). I could attempt to build without OpenMP support. However, I believe that by setting the processing threads to 1 parallel processing can probably be ruled out as the root cause.

From the debugging we have done so far, the bottleneck for this particular problem is happening in the ClpSimplexPrimal method. Version 5.6.17 finds infeasible solutions in there. Variables largestPrimalError_ and largestDualError_ contain very large numbers for over approximately 1,000 iterations before the method gives up and the problem is presumably solved by another method. I have observed the differing behavior in routine statusOfProblemInPrimal().

We added logging in a number of places in an attempt to spot differences in behavior between versions. The logging line 'statusOfProblemInPrimal 17 reason2" was added in CLPSimplexPrimal.cpp around line 1100. A file with the output is attached below. Hoping you can lend some insight into why this behavior might be happening. In version 5.6.14, this behavior was not observed (and reason2 never appears to be set equal to 2). If the looping is short circuited after say 10 or 15 iterations - the problem will solve almost immediately.

      if (numberThrownOut)
           reason2 = 1;
      if ((sumInfeasibility > 1.0e7 && sumInfeasibility > 100.0 * lastSumInfeasibility
                && factorization_->pivotTolerance() < 0.11) ||
                (largestPrimalError_ > 1.0e10 && largestDualError_ > 1.0e10))
           reason2 = 2;
	  printf("statusOfProblemInPrimal 17 reason2 = %i, sumInfeasibility = %f, lastSumInfeasibility = %f, factorization_->pivotTolerance() = %f, largestPrimalError_ = %f, largestDualError_ = %f\n", reason2, sumInfeasibility, lastSumInfeasibility, factorization_->pivotTolerance(), largestPrimalError_, largestDualError_);

statusOfProblemInPrimal_17.txt

Nov 16 '20 21:11 BMagan01

Does "using your own driver" just mean that we are using our own wrapper to load our own binary file ? We generate the LP file right after we load the problem from our own file so I suppose the LP file generated might be a for a slightly different problem.

I just mean that you are not using SYMPHONY's own main function to read in the instance, so there could be paramters being set and other things going on that I don't have a view of. If you were using the SYMPHONY default binary, it would print out all the parameter settings and everything would be more transparent to me.

In any case, the problem seems to be with Clp, not directly with SYMPHONY, so if we can isolate the issue, we can transfer it over to Clp and perhaps get some more eyeballs on what's happening. Given that I'm not seeing the same thing as you, I suspect that maybe you are somehow using a different version of Clp. So first thing would be to verify the precise SHA you have checked out with your 5.6.17 build. The lines you quoted above are actually around line 1000 in both stable/1.17 and master, so I'm wondering if you have a different version somehow.

Nov 23 '20 02:11 tkralphs

I can attempt to dump all the symphony settings from within our own main function.

When I referenced line "1100" in my prior comment, I failed to say that I have added a number of lines of debug code that helped trace down where the problems were starting. If I ignore my debugging code, the code referenced above (if (numberThrownOut) ) actually starts at line 1006.

I did not directly use git to do the checkout. On windows, I downloaded the code from here: https://www.coin-or.org/download/source/SYMPHONY/SYMPHONY-5.6.17.zip. On linux, I used the coinbrew utility to request/download version 5.6.17. I will pull the code directly from git - release/5.6.17 (commit f917d42) and compare that to the code in our build environment.

Nov 23 '20 23:11 BMagan01

I want to make sure it's clear that SYMPHONY utilizes a number of other COIN-OR projects as dependencies and you an mix and match versions of those within the same stable series. SYMPHONY 5.6.17 is a fixed version, but it can work with any Clp in the 1.17 series. What seems to come with the ZIP file above is 1.17.0, but I was testing with the latest release 1.17.6, which I think is what you should get by default if you just use coinbrew as

coinbrew fetch [email protected]

To see what you're really using, you would need to go into each dependent project and do git branch. Here's a quick bash script to do the check:

for i in *; do if [ -e $i/configure ]; then echo "Project: $i" ; cd $i; git branch; cd -; fi; done

You can post the output if you want me to check it. For at least the Windows build, it seems you are not using the latest Clp release, so the first thing to try is to upgrade that to 1.17.6. Make sure you are on 1.17.6 in Linux as well.

Nov 24 '20 14:11 tkralphs

Thanks for the response, In our linux build environment, running the above bash script returns this:

Project: Cgl

(no branch) /home/centos/coinbrew Project: Clp
(no branch) /home/centos/coinbrew Project: cmake-3.8.2
master /home/centos/coinbrew Project: CoinUtils
(no branch) /home/centos/coinbrew Project: Osi
(no branch) /home/centos/coinbrew Project: SYMPHONY
(no branch) /home/centos/coinbrew

When I pulled the latest code using coinbrew, the command I used was coinbrew fetch SYMPHONY:releases/5.6.17 --no-prompt. Is it correct to assume that this will do the same thing as the above command (coinbrew fetch [email protected]). I ran coinbrew fetch [email protected] - it did not fetch anything and the script returned the same result as above.

Running git show from the Clp folder returns this:

commit 756ddd3ed813eb1fa8b2d1b4fe813e6a4d7aa1eb Author: Stefan Vigerske [email protected] Date: Sat Apr 11 15:51:55 2020 +0000

creating Clp/releases/1.17.6 from Clp/stable/1.17 (rev 2700)

git show under the SYMPHONY folder returns this:

commit f917d42e6655a82ea4e9290aa7d41b0f60a91f20 Author: Ted Ralphs [email protected] Date: Sun Feb 24 15:00:47 2019 +0000

creating SYMPHONY/releases/5.6.17 from SYMPHONY/stable/5.6 (rev 2733)

I have not done a comparison between the windows and linux code in our builds. I will do that and also build using 1.17.6 on windows.

Nov 24 '20 23:11 BMagan01

OK, so you are using 1.17.6 on Linux. Hmm, that is strange, since I can't replicate what you're seeing. What distro and compiler are you using? If we can isolate the behavior down to something replicable in stand-alone Clp, it would help. There are some debugging routines that dump the current LP relaxation to file. Ideally, you would simply dump the current LP to disk once you observe this behavior you're describing and we could see whether we can replicate the behavior outside of SYMPHONY. If it's possible, could you call Clp's writeLP() function to dump the current LP upon some trigger that indicates it is exhibiting bad behavior?

Nov 25 '20 02:11 tkralphs

Our distro and compiler are as follows: CentOS Linux release 7.4.1708 (Core)

gcc.x86_64 4.8.5-39.el7 @base gcc-c++.x86_64 4.8.5-39.el7 @base gcc-gfortran.x86_64 4.8.5-39.el7 @base

I have not yet done any additional code comparisons (as mentioned above). When I do that, I will also call Clp's writeLP() at an appropriate time.

Nov 25 '20 20:11 BMagan01

The files in the attached .zip file are created in ClpSimplexPrimal.cpp inside this if block:

if ((sumInfeasibility > 1.0e7 && sumInfeasibility > 100.0 * lastSumInfeasibility
	&& factorization_->pivotTolerance() < 0.11)
	|| (largestPrimalError_ > 1.0e10 && largestDualError_ > 1.0e10)) {

Inside that block when if (largestPrimalError_ > 1.0e10 && largestDualError_ > 1.0e10), the files are generated using writeLp().

The format of the file names is largestPrimalError_YYYY-MM-DD_h_m_s_<numberIterations_>_.lp. The test was run with sym_int_TM_max_active_nodes=1. This file will generate for each of the 1,000 iterations. In the attached .zip file, files for the first 6 iterations are included. I believe they all of these files are identical.

primal2.zip

Nov 30 '20 16:11 BMagan01