PySCIPOpt icon indicating copy to clipboard operation
PySCIPOpt copied to clipboard

Problems in using concurrent solver, both tny and omp not works

Open richardclli opened this issue 1 year ago • 10 comments

Describe the bug

SCIPOPT Suite version = 8.1.0 Compile options: TPI=tny Problem: cause segmentation fault

SCIPOPT Suite version = 8.1.0 Compile options: TPI=omp Problem: Forced to call SCIPsolve() instead of SCIPsolveConcurrent() because SCIPtpiGetNumThreads() returns 1 Workaround: Modify scip.pxi and force to skip the check can run concurrent solve successfully

System

  • OS: OpenSUSE Leap 15.5
  • Version 4.4.0
  • SCIP version 8.1.0
  • How did you install pyscipopt? Self compile both scipopt suite and PySCIPOpt

Additional context Add any other context about the problem here.

richardclli avatar Jan 22 '24 04:01 richardclli

Hello @richardclli! Thanks for your issue. Can you please just confirm whether SCIP is actually using parallelism to solve your problem, and isn't just using a single thread when you call solveConcurrent?

Joao-Dionisio avatar Jan 23 '24 09:01 Joao-Dionisio

Yes, concurrent solving is reported in the log. So the case using TPI=omp is working properly when I just bypass the SCIPtpiGetNumThreads() checking. I am checking the source code why SCIPtpiGetNumThreads() returns 1 when using TNY=omp, but no clues yet.

initializing seeds to 1963210296 in concurrent solver 'scip-2'
initializing seeds to 1332858414 in concurrent solver 'scip-3'
initializing seeds to 1541326760 in concurrent solver 'scip-4'
initializing seeds to 247360965 in concurrent solver 'scip-5'
initializing seeds to 387742462 in concurrent solver 'scip-6'
initializing seeds to 520723434 in concurrent solver 'scip-7'
initializing seeds to 1176648445 in concurrent solver 'scip-8'
starting solve in concurrent solver 'scip-3'
starting solve in concurrent solver 'scip-1'
starting solve in concurrent solver 'scip-2'
starting solve in concurrent solver 'scip-8'
starting solve in concurrent solver 'scip-6'
starting solve in concurrent solver 'scip-7'
starting solve in concurrent solver 'scip-5'
starting solve in concurrent solver 'scip-4'

Hello @richardclli! Thanks for your issue. Can you please just confirm whether SCIP is actually using parallelism to solve your problem, and isn't just using a single thread when you call solveConcurrent?

richardclli avatar Jan 24 '24 02:01 richardclli

I face the same problem. Describe the bug

SCIPOPT Suite version = 8.1.0 Compile options: TPI=tny Problem: cause segmentation fault Segmentation fault (core dumped)

System

  • OS: Ubuntu 22.04.3 LTS
  • Version 4.4.0
  • SCIP version 8.1.0
  • How did you install pyscipopt? Self compile both scipopt suite and PySCIPOpt

Alpha-Girl avatar Jan 25 '24 15:01 Alpha-Girl

Hey @mmghannam, can you take a look at this? I haven't been able to figure out what's wrong. There have been some problems with this method over time, it seems.

Joao-Dionisio avatar Jan 25 '24 16:01 Joao-Dionisio

Hey, @richardclli @Alpha-Girl! I also need to use solveConcurrent, so I guess this is the best time to look into it :D

Can you give me a step-by-step on how you compiled SCIP with the parallelism option, and how you linked pyscipopt to it?

EDIT: I was finally able to use solveConcurrent. @richardclli, are you sure that when you are running PySCIPOpt you are linking to the correct SCIP?

Joao-Dionisio avatar Feb 14 '24 11:02 Joao-Dionisio

Hey, @richardclli @Alpha-Girl! I also need to use solveConcurrent, so I guess this is the best time to look into it :D

Can you give me a step-by-step on how you compiled SCIP with the parallelism option, and how you linked pyscipopt to it?

EDIT: I was finally able to use solveConcurrent. @richardclli, are you sure that when you are running PySCIPOpt you are linking to the correct SCIP?

Yes, I am pretty sure about this. And I managed to make it works with the following tweaks:

  1. I compile everything from scratch
  2. I compile SCIPOPT with TPI=omp (not using tny, as it may not work in Linux as I found out in some other discussions, not sure the reason)
  3. I compile PySCIPOPT, just modified the code to not checking SCIPtpiGetNumThreads() and call solveConcurrent directly.

Now I am trying to see how concurrent solve be scale up, not sure if it can works well in the HPC (super computing) environment.

richardclli avatar Feb 20 '24 01:02 richardclli

Yes, I am pretty sure about this. And I managed to make it works with the following tweaks:

  1. I compile everything from scratch
  2. I compile SCIPOPT with TPI=omp (not using tny, as it may not work in Linux as I found out in some other discussions, not sure the reason)
  3. I compile PySCIPOPT, just modified the code to not checking SCIPtpiGetNumThreads() and call solveConcurrent directly.

Now I am trying to see how concurrent solve be scale up, not sure if it can works well in the HPC (super computing) environment.

Interesting, I was able to compile with the tny option in Ubuntu. But please do let me know if the speedup is achieved! Cheers :)

Joao-Dionisio avatar Feb 20 '24 11:02 Joao-Dionisio

Interesting, I was able to compile with the tny option in Ubuntu. But please do let me know if the speedup is achieved! Cheers :)

Yes, I can compile as well, but it will gives a core dump immediately when trying to solve.

richardclli avatar Feb 21 '24 08:02 richardclli

I debugged it within pyscipopt and scip. https://github.com/scipopt/PySCIPOpt/blob/6a17c96c69b3c9f9af769f4530510e9d5ae1fa9f/src/pyscipopt/scip.pxi#L3414

As tpi=omp, SCIPtpiGetNumThreads() calls omp_get_num_threads() that always returns 1 due to not enclosing parallel region. Only SCIPconcurrentSolve() uses the macro TPI_PARA that is omp parallel https://github.com/scipopt/scip/blob/e4d2ae5dfab7d0945c0a4c0c63d21eb60c737839/src/tpi/tpi_openmp.c#L418 https://www.openmp.org/spec-html/5.0/openmpsu111.html

As tpi=tny, SCIPtpiGetNumThreads() returns _threadpool->nthreads which _threadpool is null and then _threadpool->nthreads causes segmentation fault. _threadpool is initialized in SCIPsolveConcurrent(), SCIPtpiGetNumThreads() should not be called before executing SCIPsolveConcurrent().
https://github.com/scipopt/scip/blob/e4d2ae5dfab7d0945c0a4c0c63d21eb60c737839/src/tpi/tpi_tnycthrd.c#L577

I suggest that the code scip.pxi def solveConcurrent(self):, remove below code. https://github.com/scipopt/PySCIPOpt/blob/6a17c96c69b3c9f9af769f4530510e9d5ae1fa9f/src/pyscipopt/scip.pxi#L3414-L3417 and the test code test_model.py, fix code as below https://github.com/scipopt/PySCIPOpt/blob/6a17c96c69b3c9f9af769f4530510e9d5ae1fa9f/tests/test_model.py#L83-L91

def test_solve_concurrent():
    s = Model()
    x = s.addVar("x", vtype = 'C', obj = 1.0)
    y = s.addVar("y", vtype = 'C', obj = 2.0)
    c = s.addCons(x + y <= 10.0)
    s.setPresolve(SCIP_PARAMSETTING.OFF)
    s.setMaximize()
    s.solveConcurrent()
    if s.getStage() != SCIP_STAGE.PROBLEM:
        assert s.getStatus() == 'optimal'
        assert s.getObjVal() == 20.0

liangbug avatar Apr 08 '24 06:04 liangbug

I debugged it within pyscipopt and scip.

https://github.com/scipopt/PySCIPOpt/blob/6a17c96c69b3c9f9af769f4530510e9d5ae1fa9f/src/pyscipopt/scip.pxi#L3414

As tpi=omp, SCIPtpiGetNumThreads() calls omp_get_num_threads() that always returns 1 due to not enclosing parallel region. Only SCIPconcurrentSolve() uses the macro TPI_PARA that is omp parallel https://github.com/scipopt/scip/blob/e4d2ae5dfab7d0945c0a4c0c63d21eb60c737839/src/tpi/tpi_openmp.c#L418 https://www.openmp.org/spec-html/5.0/openmpsu111.html

As tpi=tny, SCIPtpiGetNumThreads() returns _threadpool->nthreads which _threadpool is null and then _threadpool->nthreads causes segmentation fault. _threadpool is initialized in SCIPsolveConcurrent(), SCIPtpiGetNumThreads() should not be called before executing SCIPsolveConcurrent(). https://github.com/scipopt/scip/blob/e4d2ae5dfab7d0945c0a4c0c63d21eb60c737839/src/tpi/tpi_tnycthrd.c#L577

I suggest that the code scip.pxi def solveConcurrent(self):, remove below code.

https://github.com/scipopt/PySCIPOpt/blob/6a17c96c69b3c9f9af769f4530510e9d5ae1fa9f/src/pyscipopt/scip.pxi#L3414-L3417

and the test code test_model.py, fix code as below

https://github.com/scipopt/PySCIPOpt/blob/6a17c96c69b3c9f9af769f4530510e9d5ae1fa9f/tests/test_model.py#L83-L91

def test_solve_concurrent():
    s = Model()
    x = s.addVar("x", vtype = 'C', obj = 1.0)
    y = s.addVar("y", vtype = 'C', obj = 2.0)
    c = s.addCons(x + y <= 10.0)
    s.setPresolve(SCIP_PARAMSETTING.OFF)
    s.setMaximize()
    s.solveConcurrent()
    if s.getStage() != SCIP_STAGE.PROBLEM:
        assert s.getStatus() == 'optimal'
        assert s.getObjVal() == 20.0

Wow, nice catch.

richardclli avatar Apr 08 '24 07:04 richardclli