scs
scs copied to clipboard
SCS segmentation fault
Hi All -- I am solving an SDP program on say a matrix of n x n dimensions. I am using CVXPY with SCS solver and using GPU to solve the problem. The issue is that with n = 10000 when I ran the program it gave a segmentation fault (core dumped) error. The stdout looked like this
----------------------------------------------------------------------------
SCS v2.0.2 - Splitting Conic Solver
(c) Brendan O'Donoghue, Stanford University, 2012-2017
----------------------------------------------------------------------------
(c) Brendan O'Donoghue, Stanford University, 2012-2017
Lin-sys: sparse-indirect GPU, nnz in A = 99997194, CG tol ~ 1/iter^(2.00)
eps = 1.00e-05, alpha = 1.50, max_iters = 5000, normalize = 1, scale = 1.00
acceleration_lookback = 20, rho_x = 1.00e-03
Variables n = 50005000, constraints m = 99987195
Cones: primal zero / dual free vars: 49982195
sd vars: 50005000, sd blks: 1
Setup time: 6.96e+00s
----------------------------------------------------------------------------
Iter | pri res | dua res | rel gap | pri obj | dua obj | kap/tau | time (s)
----------------------------------------------------------------------------
Segmentation fault (core dumped)
The same program works for n = 5000 and took about 3.5 hours to solve the SDP
My guess is that your GPU is running out of memory, sometimes it gives a nice error when this happens and sometimes it just seg-faults so it can be hard to detect. You might be able to get more of a handle on the memory usage by calling nvidia-smi
. There are ways to decrease the amount of memory that SCS uses, in particular by setting the macro GPU_TRANSPOSE_MAT
to false when compiling, and also using 32 bit floats (which can affect stability).
Hi @bodono I was not able to figure out the exact issue, so currently I have stopped trying to scale the problem up. But In future when I start looking at it again. I'll update the status on this thread if I find something helpful.
OK, did you try the same problem with the indirect solver on CPU? If that doesn't segfault then it's likely a GPU memory issue.