scs icon indicating copy to clipboard operation
scs copied to clipboard

./out/demo_socp_gpu fails to solve its problem

Open kalmarek opened this issue 3 years ago • 19 comments

Specifications

  • OS: Arch Linux
  • SCS Version: master at 5be0e1684d12c4cfd4d22c5fba236a84a092ab5b
  • Compiler: gcc

Description

scs fails at solving ./out/demo_socp_gpu 1000 0.5 0.5 1

How to reproduce

linking against julia openblas:

JULIA_HOME="/opt/julias/julia-1.6"
JULIA_LD_PATH="$JULIA_HOME/lib/julia"
BLASLDFLAGS="-L$JULIA_LD_PATH -lopenblas64_"
SCSFLAGS="USE_OPENMP=1 BLAS64=1 BLASSUFFIX=_64_"
make -j4 CFLAGS="-march=native" DLONG=0 ${SCSFLAGS} BLASLDFLAGS="${BLASLDFLAGS}" gpu

then running it via

LD_LIBRARY_PATH=$JULIA_LD_PATH:$LD_LIBRARY_PATH ./out/demo_socp_gpu 1000 0.5 0.5 1

Additional information

similarly compiled direct and indirect solvers (cpu) work just fine

Output

seed : 1

A is 4000 by 1000, with 32 nonzeros per column.
A has 32000 nonzeros (0.800000% dense).
Nonzeros of A take 0.000238 GB of storage.
Row idxs of A take 0.000119 GB of storage.
Col ptrs of A take 0.000004 GB of storage.

ScsCone information:
Zero cone rows: 2000
LP cone rows: 2000
Number of second-order cones: 0, covering 0 rows, with sizes
[]
Number of rows covered is 4000 out of 4000.

true pri opt = 2022.070521
true dua opt = 2022.070521
------------------------------------------------------------------
               SCS v3.0.0 - Splitting Conic Solver
        (c) Brendan O'Donoghue, Stanford University, 2012
------------------------------------------------------------------
problem:  variables n: 1000, constraints m: 4000
cones:    z: primal zero / dual free vars: 2000
          l: linear vars: 2000
settings: eps_abs: 1.0e-04, eps_rel: 1.0e-04, eps_infeas: 1.0e-07
          alpha: 1.50, scale: 1.00e-01, adaptive_scale: 1
          max_iters: 100000, normalize: 1, warm_start: 0
          acceleration_lookback: 10, acceleration_interval: 10
lin-sys:  sparse-indirect GPU
          nnz(A): 32000, nnz(P): 0
------------------------------------------------------------------
 iter | pri res | dua res |   gap   |   obj   |  scale  | time (s)
------------------------------------------------------------------
     0| 6.90e+00  9.46e+01  3.33e+04 -1.66e+04  1.00e-01  1.03e-03 
   250| 1.76e+04  4.31e+01  1.23e+04 -6.15e+03  1.00e-01  1.65e-01 
   500| 2.74e+04  4.29e+01  1.23e+04 -6.16e+03  1.00e-01  3.29e-01 
   750| 1.57e+04  4.26e+01  1.23e+04 -6.16e+03  1.00e-01  4.94e-01 
  1000| 1.64e+04  4.29e+01  1.23e+04 -6.16e+03  1.00e-01  6.85e-01 
  1250| 4.30e+21  2.67e+22  6.54e+22 -3.27e+22  1.00e-01  8.48e-01 
  1500| 1.90e+04  4.29e+01  1.23e+04 -6.16e+03  1.00e-01  9.48e-01 
  1750| 2.14e+04  4.29e+01  1.23e+04 -6.16e+03  1.00e-01  1.04e+00 
  2000| 2.48e+04  4.29e+01  1.23e+04 -6.16e+03  1.00e-01  1.13e+00 
  2250| 6.45e+20  2.19e+22  4.21e+22  2.11e+22  1.00e-01  1.22e+00 
  2500| 2.07e+04  4.29e+01  1.23e+04 -6.16e+03  1.00e-01  1.30e+00 
  2750| 2.53e+04  4.29e+01  1.23e+04 -6.16e+03  1.00e-01  1.39e+00 
  3000| 2.02e+04  4.29e+01  1.23e+04 -6.16e+03  1.00e-01  1.48e+00 
  3250| 5.72e+20  3.01e+22  3.73e+22  1.87e+22  1.00e-01  1.57e+00 
  3500| 2.09e+04  4.29e+01  1.23e+04 -6.16e+03  1.00e-01  1.66e+00 
  3750| 2.43e+04  4.29e+01  1.23e+04 -6.16e+03  1.00e-01  1.75e+00 
  4000| 2.31e+04  4.29e+01  1.23e+04 -6.16e+03  1.00e-01  1.84e+00 
 [ ... ]
 99500| 2.48e+04  4.29e+01  1.23e+04 -6.16e+03  1.00e-01  3.65e+01 
 99750| 2.48e+04  4.29e+01  1.23e+04 -6.16e+03  1.00e-01  3.67e+01 
100000| 2.48e+04  4.29e+01  1.23e+04 -6.16e+03  1.00e-01  3.68e+01 
------------------------------------------------------------------
status:  solved (inaccurate - reached max_iters)
timings: total: 3.68e+01s = setup: 5.47e-02s + solve: 3.68e+01s
         lin-sys: 3.16e+01s, cones: 7.88e-01s, accel: 4.77e-01s
------------------------------------------------------------------
objective = -6159.028853 (inaccurate)
------------------------------------------------------------------
true pri opt = 2022.070521
true dua opt = 2022.070521
scs pri obj= 0.000000
scs dua obj = -12318.057707

kalmarek avatar Oct 15 '21 20:10 kalmarek

Thanks for posting. I am unable to reproduce this, when I run the command I get:

2021-10-16 14:47:37 (base) 0 bodonoghue@bodonoghue-[]-~/git/scs:
└──[ins] => out/demo_socp_gpu_indirect 1000 0.5 0.5 1
seed : 1

A is 4000 by 1000, with 32 nonzeros per column.
A has 32000 nonzeros (0.800000% dense).
Nonzeros of A take 0.000238 GB of storage.
Row idxs of A take 0.000119 GB of storage.
Col ptrs of A take 0.000004 GB of storage.

ScsCone information:
Zero cone rows: 2000
LP cone rows: 2000
Number of second-order cones: 0, covering 0 rows, with sizes
[]
Number of rows covered is 4000 out of 4000.

true pri opt = 2022.070521
true dua opt = 2022.070521
------------------------------------------------------------------
	       SCS v3.0.0 - Splitting Conic Solver
	(c) Brendan O'Donoghue, Stanford University, 2012
------------------------------------------------------------------
problem:  variables n: 1000, constraints m: 4000
cones: 	  z: primal zero / dual free vars: 2000
	  l: linear vars: 2000
settings: eps_abs: 1.0e-04, eps_rel: 1.0e-04, eps_infeas: 1.0e-07
	  alpha: 1.50, scale: 1.00e-01, adaptive_scale: 1
	  max_iters: 100000, normalize: 1, warm_start: 0
	  acceleration_lookback: 10, acceleration_interval: 10
lin-sys:  sparse-indirect GPU
	  nnz(A): 32000, nnz(P): 0
------------------------------------------------------------------
 iter | pri res | dua res |   gap   |   obj   |  scale  | time (s)
------------------------------------------------------------------
     0| 6.90e+00  7.44e+00  2.65e+02  3.90e+03  1.00e-01  2.11e-02
    25| 3.80e-06  3.17e-04  3.36e-03  2.02e+03  1.00e-01  1.08e-01
------------------------------------------------------------------
status:  solved
timings: total: 6.66e-01s = setup: 5.58e-01s + solve: 1.08e-01s
	 lin-sys: 8.57e-02s, cones: 2.84e-04s, accel: 6.22e-05s
------------------------------------------------------------------
objective = 2022.072100
------------------------------------------------------------------
true pri opt = 2022.070521
true dua opt = 2022.070521
scs pri obj= 2022.070419
scs dua obj = 2022.073782

It might be the case that you are missing the gpu fixes I submitted here: https://github.com/cvxgrp/scs/commit/13e675d8c1f17e8f1e184281b25b8196c4ac74da.

I did not cut a new release / tag with those fixes. Is that the issue?

By the way, you can better test the gpu using:

make purge
make test_gpu
out/run_tests_gpu_indirect

bodono avatar Oct 16 '21 13:10 bodono

I'm on master as of 5be0e1684d12c4cfd4d22c5fba236a84a092ab5b I have CUDA_PATH=/opt/cuda in my env pointing to cuda-11.4.2. I compiled scs with

make purge
make test_gpu

as advised and then test it with ./out/run_tests_gpu_indirect. here is what I get:

cc -g -Wall -Wwrite-strings -pedantic -funroll-loops -Wstrict-prototypes -I. -Iinclude -Ilinsys -O3 -fPIC -DCTRLC=1  -DCOPYAMATRIX=1  -DGPU_TRANSPOSE_MAT=1  -DUSE_LAPACK -DINDIRECT=1 -c src/scs.c -o src/scs_indir.o
cc -g -Wall -Wwrite-strings -pedantic -funroll-loops -Wstrict-prototypes -I. -Iinclude -Ilinsys -O3 -fPIC -DCTRLC=1  -DCOPYAMATRIX=1  -DGPU_TRANSPOSE_MAT=1  -DUSE_LAPACK   -c -o src/util.o src/util.c
cc -g -Wall -Wwrite-strings -pedantic -funroll-loops -Wstrict-prototypes -I. -Iinclude -Ilinsys -O3 -fPIC -DCTRLC=1  -DCOPYAMATRIX=1  -DGPU_TRANSPOSE_MAT=1  -DUSE_LAPACK   -c -o src/cones.o src/cones.c
cc -g -Wall -Wwrite-strings -pedantic -funroll-loops -Wstrict-prototypes -I. -Iinclude -Ilinsys -O3 -fPIC -DCTRLC=1  -DCOPYAMATRIX=1  -DGPU_TRANSPOSE_MAT=1  -DUSE_LAPACK   -c -o src/aa.o src/aa.c
cc -g -Wall -Wwrite-strings -pedantic -funroll-loops -Wstrict-prototypes -I. -Iinclude -Ilinsys -O3 -fPIC -DCTRLC=1  -DCOPYAMATRIX=1  -DGPU_TRANSPOSE_MAT=1  -DUSE_LAPACK   -c -o src/rw.o src/rw.c
cc -g -Wall -Wwrite-strings -pedantic -funroll-loops -Wstrict-prototypes -I. -Iinclude -Ilinsys -O3 -fPIC -DCTRLC=1  -DCOPYAMATRIX=1  -DGPU_TRANSPOSE_MAT=1  -DUSE_LAPACK   -c -o src/linalg.o src/linalg.c
cc -g -Wall -Wwrite-strings -pedantic -funroll-loops -Wstrict-prototypes -I. -Iinclude -Ilinsys -O3 -fPIC -DCTRLC=1  -DCOPYAMATRIX=1  -DGPU_TRANSPOSE_MAT=1  -DUSE_LAPACK   -c -o src/ctrlc.o src/ctrlc.c
cc -g -Wall -Wwrite-strings -pedantic -funroll-loops -Wstrict-prototypes -I. -Iinclude -Ilinsys -O3 -fPIC -DCTRLC=1  -DCOPYAMATRIX=1  -DGPU_TRANSPOSE_MAT=1  -DUSE_LAPACK   -c -o src/scs_version.o src/scs_version.c
cc -g -Wall -Wwrite-strings -pedantic -funroll-loops -Wstrict-prototypes -I. -Iinclude -Ilinsys -O3 -fPIC -DCTRLC=1  -DCOPYAMATRIX=1  -DGPU_TRANSPOSE_MAT=1  -DUSE_LAPACK   -c -o src/normalize.o src/normalize.c
cc  -c -o linsys/gpu/indirect/private.o linsys/gpu/indirect/private.c -g -Wall -Wwrite-strings -pedantic -funroll-loops -Wstrict-prototypes -I. -Iinclude -Ilinsys -O3 -fPIC -DCTRLC=1  -DCOPYAMATRIX=1  -DGPU_TRANSPOSE_MAT=1  -DUSE_LAPACK -I/opt/cuda/include -Ilinsys/gpu -Wno-c++11-long-long  -DCTRLC=1  -DCOPYAMATRIX=1  -DGPU_TRANSPOSE_MAT=1  -DUSE_LAPACK
cc -g -Wall -Wwrite-strings -pedantic -funroll-loops -Wstrict-prototypes -I. -Iinclude -Ilinsys -O3 -fPIC -DCTRLC=1  -DCOPYAMATRIX=1  -DGPU_TRANSPOSE_MAT=1  -DUSE_LAPACK   -c -o linsys/scs_matrix.o linsys/scs_matrix.c
cc -g -Wall -Wwrite-strings -pedantic -funroll-loops -Wstrict-prototypes -I. -Iinclude -Ilinsys -O3 -fPIC -DCTRLC=1  -DCOPYAMATRIX=1  -DGPU_TRANSPOSE_MAT=1  -DUSE_LAPACK   -c -o linsys/csparse.o linsys/csparse.c
mkdir -p out
ar rv out/libscsgpuindir.a src/scs_indir.o src/util.o src/cones.o src/aa.o src/rw.o src/linalg.o src/ctrlc.o src/scs_version.o src/normalize.o linsys/gpu/indirect/private.o linsys/scs_matrix.o linsys/csparse.o linsys/gpu/gpu.o
ar: creating out/libscsgpuindir.a
a - src/scs_indir.o
a - src/util.o
a - src/cones.o
a - src/aa.o
a - src/rw.o
a - src/linalg.o
a - src/ctrlc.o
a - src/scs_version.o
a - src/normalize.o
a - linsys/gpu/indirect/private.o
a - linsys/scs_matrix.o
a - linsys/csparse.o
a - linsys/gpu/gpu.o
ranlib out/libscsgpuindir.a
cc -g -Wall -Wwrite-strings -pedantic -funroll-loops -Wstrict-prototypes -I. -Iinclude -Ilinsys -O3 -fPIC -DCTRLC=1  -DCOPYAMATRIX=1  -DGPU_TRANSPOSE_MAT=1  -DUSE_LAPACK -o out/run_tests_gpu_indirect test/run_tests.c out/libscsgpuindir.a -lm -lrt -lblas -llapack  -L/opt/cuda/lib -L/opt/cuda/lib64 -lcudart -lcublas -lcusparse -Itest
test_fails
Testing that SCS handles bad inputs correctly:eps_abs tolerance must be positive
ERROR: Validation returned failure
Failure:could not initialize work
degenerate
------------------------------------------------------------------
               SCS v3.0.0 - Splitting Conic Solver
        (c) Brendan O'Donoghue, Stanford University, 2012
------------------------------------------------------------------
problem:  variables n: 2, constraints m: 4
cones:    l: linear vars: 4
settings: eps_abs: 1.0e-06, eps_rel: 1.0e-06, eps_infeas: 1.0e-09
          alpha: 1.50, scale: 1.00e-01, adaptive_scale: 1
          max_iters: 100000, normalize: 1, warm_start: 0
          acceleration_lookback: 10, acceleration_interval: 10
lin-sys:  sparse-indirect GPU
          nnz(A): 4, nnz(P): 2
------------------------------------------------------------------
 iter | pri res | dua res |   gap   |   obj   |  scale  | time (s)
------------------------------------------------------------------
     0| 2.10e+01  2.00e+00  7.90e+00 -3.95e+00  1.00e-01  1.47e-04 
   250| 5.69e+11  2.00e+00  0.00e+00  0.00e+00  1.00e+06  2.53e-02 
   500| 5.69e+11  2.00e+00  0.00e+00  0.00e+00  1.00e+06  5.54e-02 
   750| 5.69e+11  2.00e+00  0.00e+00  0.00e+00  1.00e+06  7.65e-02 
  1000| 5.69e+11  2.00e+00  0.00e+00  0.00e+00  1.00e+06  9.70e-02 
  1250| 5.69e+11  2.00e+00  0.00e+00  0.00e+00  1.00e+06  1.18e-01 
  1500| 5.69e+11  2.00e+00  0.00e+00  0.00e+00  1.00e+06  1.39e-01 
  1750| 5.69e+11  2.00e+00  0.00e+00  0.00e+00  1.00e+06  1.60e-01 
  2000| 5.69e+11  2.00e+00  0.00e+00  0.00e+00  1.00e+06  1.81e-01 
  2250| 5.69e+11  2.00e+00  0.00e+00  0.00e+00  1.00e+06  2.02e-01
[...]
 99750| 5.69e+11  2.00e+00  0.00e+00  0.00e+00  1.00e+06  7.39e+00 
100000| 5.69e+11  2.00e+00  0.00e+00  0.00e+00  1.00e+06  7.41e+00 
------------------------------------------------------------------
status:  solved (inaccurate - reached max_iters)
timings: total: 7.45e+00s = setup: 4.52e-02s + solve: 7.41e+00s
         lin-sys: 7.25e+00s, cones: 2.01e-02s, accel: 8.37e-02s
------------------------------------------------------------------
objective = 0.000000 (inaccurate)
------------------------------------------------------------------
INVALID STATUS
Tests run: 2

no fancy options, no julia-shipped blas ;)

~/local/src/scs   master  ldd ./out/run_tests_gpu_indirect 
        linux-vdso.so.1 (0x00007ffcff3ba000)
        libm.so.6 => /usr/lib/libm.so.6 (0x00007f12b0400000)
        librt.so.1 => /usr/lib/librt.so.1 (0x00007f12b03f5000)
        libopenblas.so.3 => /usr/lib/libopenblas.so.3 (0x00007f12aefd5000)
        liblapack.so.3 => /usr/lib/liblapack.so.3 (0x00007f12ae90b000)
        libcudart.so.11.0 => /opt/cuda/lib64/libcudart.so.11.0 (0x00007f12ae669000)
        libcublas.so.11 => /opt/cuda/lib64/libcublas.so.11 (0x00007f12a52b5000)
        libcusparse.so.11 => /opt/cuda/lib64/libcusparse.so.11 (0x00007f1296ec8000)
        libc.so.6 => /usr/lib/libc.so.6 (0x00007f1296cfc000)
        /lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007f12b0597000)
        libpthread.so.0 => /usr/lib/libpthread.so.0 (0x00007f1296cdb000)
        libgomp.so.1 => /usr/lib/libgomp.so.1 (0x00007f1296c97000)
        libgfortran.so.5 => /usr/lib/libgfortran.so.5 (0x00007f12969db000)
        libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x00007f12969c0000)
        libdl.so.2 => /usr/lib/libdl.so.2 (0x00007f12969b7000)
        libcublasLt.so.11 => /opt/cuda/lib64/libcublasLt.so.11 (0x00007f1282fbb000)
        libquadmath.so.0 => /usr/lib/../lib/libquadmath.so.0 (0x00007f1282f70000)

kalmarek avatar Oct 26 '21 20:10 kalmarek

That's strange, I cannot reproduce this on the only gpu machine I have access to. Can you try disabling the AA? You can do it by changing ACCELERATION_LOOKBACK to 0 in include/glbopts.h which will disable it for the tests that do not specify it manually and it should be clear if that's the issue.

Here's what my ldd looks like, I don't see any major differences to yours:

└──[ins] => ldd out/run_tests_gpu_indirect
	linux-vdso.so.1 (0x00007ffc11d05000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f7c3fcf9000)
	libblas.so.3 => /usr/lib/x86_64-linux-gnu/libblas.so.3 (0x00007f7c3fc97000)
	liblapack.so.3 => /usr/lib/x86_64-linux-gnu/liblapack.so.3 (0x00007f7c3f5fa000)
	libcudart.so.11.0 => /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudart.so.11.0 (0x00007f7c3f375000)
	libcublas.so.11 => /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcublas.so.11 (0x00007f7c37e9a000)
	libcusparse.so.11 => /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcusparse.so.11 (0x00007f7c29e1c000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f7c29c55000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f7c3fe94000)
	libopenblas.so.0 => /usr/lib/x86_64-linux-gnu/libopenblas.so.0 (0x00007f7c2781e000)
	libgfortran.so.5 => /usr/lib/x86_64-linux-gnu/libgfortran.so.5 (0x00007f7c27574000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f7c2756e000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f7c2754d000)
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f7c27542000)
	libcublasLt.so.11 => /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcublasLt.so.11 (0x00007f7c19776000)
	libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f7c1956a000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f7c19550000)
	libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007f7c19507000)

Can you try running

valgrind --leak-check=full out/run_tests_gpu_indirect

it likely won't help (and is very noisy for gpus) but just in case.

bodono avatar Oct 27 '21 16:10 bodono

I disabled AA but it changed just the numerical values in the log, not the behaviour; here's valgrind log: https://gist.github.com/kalmarek/adb225c93de2bb8d9a7032caec42eea9

I think the problem is somewhere in problem generation (before scs), since the header looks like this:

test_fails
Testing that SCS handles bad inputs correctly:eps_abs tolerance must be positive
ERROR: Validation returned failure
Failure:could not initialize work
degenerate
------------------------------------------------------------------
               SCS v3.0.0 - Splitting Conic Solver
        (c) Brendan O'Donoghue, Stanford University, 2012
------------------------------------------------------------------
problem:  variables n: 2, constraints m: 4
cones:    l: linear vars: 4
settings: eps_abs: 1.0e-06, eps_rel: 1.0e-06, eps_infeas: 1.0e-09
          alpha: 1.50, scale: 1.00e-01, adaptive_scale: 1
          max_iters: 100000, normalize: 1, warm_start: 0
lin-sys:  sparse-indirect GPU
          nnz(A): 4, nnz(P): 2

i.e. first non positive eps_abs and then a problem with 2 variables and 4 constraints?

kalmarek avatar Oct 27 '21 21:10 kalmarek

That's just the output of the first test which is testing data validation and is working correctly. You will see the same if you run the non gpu tests without/run_tests_direct. The first real problem is a tiny lp with 2 vars and 4 constraints.

bodono avatar Oct 27 '21 22:10 bodono

I have got the same problem with @kalmarek .

duyipai avatar Oct 28 '21 08:10 duyipai

That's just the output of the first test which is testing data validation and is working correctly. You will see the same if you run the non gpu tests without/run_tests_direct. The first real problem is a tiny lp with 2 vars and 4 constraints.

yeah, maybe I should try to compare with run_tests_direct first ;)

kalmarek avatar Oct 28 '21 09:10 kalmarek

@bodono: so I set VERBOSITY=2 and it seems that cg is never run succesfully. those cuda errors

linsys/gpu/indirect/private.c:506:scs_solve_lin_sys
 ERROR_CUDA (#): invalid argument

seem to go away if i replace macro expanded CUBLAS(name) to the appropriate one, but the end result is the same. I literarly have no idea what I am doing ;), but you could suggest how to diagnose it next I'd be glad!

**********************************************************
Running test: test_validation
Testing that SCS handles bad inputs correctly:
eps_abs tolerance must be positive
ERROR: Validation returned failure
size of scs_int = 4, size of scs_float = 8
Failure:could not initialize work
**********************************************************
**********************************************************
Running test: degenerate
------------------------------------------------------------------
               SCS v3.0.0 - Splitting Conic Solver
        (c) Brendan O'Donoghue, Stanford University, 2012
------------------------------------------------------------------
problem:  variables n: 2, constraints m: 4
cones:    l: linear vars: 4
settings: eps_abs: 1.0e-06, eps_rel: 1.0e-06, eps_infeas: 1.0e-09
          alpha: 1.50, scale: 1.00e-01, adaptive_scale: 1
          max_iters: 50, normalize: 1, warm_start: 0
          acceleration_lookback: 10, acceleration_interval: 10
lin-sys:  sparse-indirect GPU
          nnz(A): 4, nnz(P): 2
getting pre-conditioner
finished getting pre-conditioner
size of scs_int = 4, size of scs_float = 8
linsys/gpu/indirect/private.c:506:scs_solve_lin_sys
 ERROR_CUDA (#): invalid argument
tol 1.000e-12
cg_its 0
------------------------------------------------------------------
 iter | pri res | dua res |   gap   |   obj   |  scale  | time (s)
------------------------------------------------------------------
     0| 2.10e+01  2.00e+00  7.90e+00 -3.95e+00  1.00e-01  3.27e-04 
Norm u = 2.306122, Norm u_t = 1.492570, Norm v = 1.939709, Norm x = 0.000000, Norm y = 4.450789, Norm s = 22.360680, Norm |Ax + s| = 2.24e+01, tau = 1.000000, kappa = 0.000000, |u - u_t| = 1.11e+00, res_infeas = nan, res_unbdd_a = nan, res_unbdd_p = nan, ctx_tau = 0.00e+00, bty_tau = 7.90e+00
linsys/gpu/indirect/private.c:506:scs_solve_lin_sys
 ERROR_CUDA (#): invalid argument
tol 1.000e-12
cg_its 0
     1| 3.68e+01  2.00e+00  0.00e+00  0.00e+00  1.00e-01  6.66e-04 
Norm u = 17.210439, Norm u_t = 18.766100, Norm v = 29.666025, Norm x = 0.000000, Norm y = 0.000000, Norm s = 877.991704, Norm |Ax + s| = 8.78e+02, tau = 17.210439, kappa = 0.000000, |u - u_t| = 1.81e+01, res_infeas = nan, res_unbdd_a = nan, res_unbdd_p = nan, ctx_tau = 0.00e+00, bty_tau = 0.00e+00
linsys/gpu/indirect/private.c:506:scs_solve_lin_sys
 ERROR_CUDA (#): invalid argument
tol 1.000e-12
cg_its 0
     2| 9.46e+01  2.00e+00  0.00e+00  0.00e+00  1.00e-01  1.37e-03 
Norm u = 10.600861, Norm u_t = 22.294830, Norm v = 35.509350, Norm x = 0.000000, Norm y = 0.000000, Norm s = 1226.504583, Norm |Ax + s| = 1.23e+03, tau = 10.600861, kappa = 0.000000, |u - u_t| = 2.20e+01, res_infeas = nan, res_unbdd_a = nan, res_unbdd_p = nan, ctx_tau = 0.00e+00, bty_tau = 0.00e+00
linsys/gpu/indirect/private.c:506:scs_solve_lin_sys
 ERROR_CUDA (#): invalid argument
tol 1.000e-12
cg_its 0
     3| 2.28e+02  2.00e+00  0.00e+00  0.00e+00  1.00e-01  2.07e-03 
Norm u = 5.455154, Norm u_t = 25.405974, Norm v = 40.611483, Norm x = 0.000000, Norm y = 0.000000, Norm s = 1472.679019, Norm |Ax + s| = 1.47e+03, tau = 5.455154, kappa = 0.000000, |u - u_t| = 2.53e+01, res_infeas = nan, res_unbdd_a = nan, res_unbdd_p = nan, ctx_tau = 0.00e+00, bty_tau = 0.00e+00
linsys/gpu/indirect/private.c:506:scs_solve_lin_sys
 ERROR_CUDA (#): invalid argument
tol 1.000e-12
cg_its 0
     4| 5.39e+02  2.00e+00  0.00e+00  0.00e+00  1.00e-01  2.34e-03 
Norm u = 2.454521, Norm u_t = 26.247918, Norm v = 41.989207, Norm x = 0.000000, Norm y = 0.000000, Norm s = 1544.434977, Norm |Ax + s| = 1.54e+03, tau = 2.454521, kappa = 0.000000, |u - u_t| = 2.62e+01, res_infeas = nan, res_unbdd_a = nan, res_unbdd_p = nan, ctx_tau = 0.00e+00, bty_tau = 0.00e+00
linsys/gpu/indirect/private.c:506:scs_solve_lin_sys
 ERROR_CUDA (#): invalid argument
tol 1.000e-12
cg_its 0
     5| 1.26e+03  2.00e+00  0.00e+00  0.00e+00  1.00e-01  2.62e-03 
[...]
    48| 1.05e+18  2.00e+00  0.00e+00  0.00e+00  1.00e-01  1.60e-02 
Norm u = 0.000000, Norm u_t = 26.457513, Norm v = 42.332021, Norm x = 0.000000, Norm y = 0.000000, Norm s = 1569.004030, Norm |Ax + s| = 1.57e+03, tau = 0.000000, kappa = 0.000000, |u - u_t| = 2.65e+01, res_infeas = nan, res_unbdd_a = nan, res_unbdd_p = nan, ctx_tau = 0.00e+00, bty_tau = 0.00e+00
linsys/gpu/indirect/private.c:506:scs_solve_lin_sys
 ERROR_CUDA (#): invalid argument
tol 1.000e-12
cg_its 0
    49| 5.29e+17  2.00e+00  0.00e+00  0.00e+00  1.00e-01  1.63e-02 
Norm u = 0.000000, Norm u_t = 26.457513, Norm v = 42.332021, Norm x = 0.000000, Norm y = 0.000000, Norm s = 1569.004030, Norm |Ax + s| = 1.57e+03, tau = 0.000000, kappa = 0.000000, |u - u_t| = 2.65e+01, res_infeas = nan, res_unbdd_a = nan, res_unbdd_p = nan, ctx_tau = 0.00e+00, bty_tau = 0.00e+00
    50| 5.29e+17  2.00e+00  0.00e+00  0.00e+00  1.00e-01  1.63e-02 
Norm u = 0.000000, Norm u_t = 26.457513, Norm v = 42.332021, Norm x = 0.000000, Norm y = 0.000000, Norm s = 1569.004030, Norm |Ax + s| = 1.57e+03, tau = 0.000000, kappa = 0.000000, |u - u_t| = 2.65e+01, res_infeas = nan, res_unbdd_a = nan, res_unbdd_p = nan, ctx_tau = 0.00e+00, bty_tau = 0.00e+00
------------------------------------------------------------------
status:  solved (inaccurate - reached max_iters)
timings: total: 5.82e-02s = setup: 4.19e-02s + solve: 1.63e-02s
         lin-sys: 1.51e-02s, cones: 1.97e-05s, accel: 3.52e-06s
------------------------------------------------------------------
objective = 0.000000 (inaccurate)
------------------------------------------------------------------
**********************************************************
INVALID STATUS
Tests run: 2

kalmarek avatar Jan 07 '22 19:01 kalmarek

Ok, can you try with VERBOSITY=4? That should print out some info on whether pcg is running correctly. The fact that you're seeing cg_its 0 is worrying.

The macro itself has an error check when VERBOSITY>0 (see here), which is why the error goes away when you replace it (although it does suggest that only that line is broken, which is strange).

bodono avatar Jan 07 '22 22:01 bodono

I just pushed c10b3fe228b42140279add05659afe5883eeccf6. Pull that down and see if it fixes it.

Sorry, false alarm.

bodono avatar Jan 07 '22 23:01 bodono

Even with VERBOSITY=4 I don't see other output, since cg_gpu_norm(cublas_handle, r, n) < tol is satisfied in https://github.com/cvxgrp/scs/blob/77c86c89bc8d75dce0e8475c364f805fdb62cef0/linsys/gpu/indirect/private.c#L399 If I put the printf statement above I get the old

linsys/gpu/indirect/private.c:16:cg_gpu_norm
 ERROR_CUDA (#): invalid argument

I'm not sure how to test that my CUDA/cublas is installed properly?

kalmarek avatar Jan 08 '22 21:01 kalmarek

Can you try setting USE_L2_NORM to 1?

bodono avatar Jan 09 '22 20:01 bodono

I set it to 1 but I get a similar behavior (though no errors). I also checked that nrm is always 0 in cg_gpu_norm, though &r[1] prints as 1.000000...

kalmarek avatar Jan 09 '22 21:01 kalmarek

This is so strange, I don't understand what's happening here at all and I can't reproduce this behavior on my gpu machine. If you really want to get to the bottom of this then I'm happy to get on a call and we can debug together manually on your machine.

bodono avatar Jan 14 '22 15:01 bodono

Thanks! I asked for the access to a nvidia gpu at my institution; If I can reproduce it there I'll get back to you!

kalmarek avatar Jan 16 '22 09:01 kalmarek

Dear @bodono I managed to get access to a gpu-enabled node and run some tests there;

  • a simple make test_gpu which results in
~/local/scs$ ldd ./out/run_tests_gpu_indirect 
        linux-vdso.so.1 (0x00007fff935d2000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fbb17291000)
        liblapack.so.3 => /usr/lib/x86_64-linux-gnu/liblapack.so.3 (0x00007fbb16bed000)
        libblas.so.3 => /usr/lib/x86_64-linux-gnu/libblas.so.3 (0x00007fbb16b80000)
        libcudart.so.10.1 => /usr/lib/x86_64-linux-gnu/libcudart.so.10.1 (0x00007fbb16904000)
        libcublas.so.10 => /usr/lib/x86_64-linux-gnu/libcublas.so.10 (0x00007fbb12b69000)
        libcusparse.so.10 => /usr/lib/x86_64-linux-gnu/libcusparse.so.10 (0x00007fbb0b8e0000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fbb0b6ee000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fbb17459000)
        libgfortran.so.5 => /usr/lib/x86_64-linux-gnu/libgfortran.so.5 (0x00007fbb0b426000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fbb0b40b000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fbb0b405000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fbb0b3e2000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fbb0b3d6000)
        libcublasLt.so.10 => /usr/lib/x86_64-linux-gnu/libcublasLt.so.10 (0x00007fbb09532000)
        libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fbb09350000)
        libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007fbb09306000)

runs just fine (11 out of 11 tests passed).

  • This works just fine even when I replace the systems CUDA with the one shipped with julia:
~/local/scs$ LD_LIBRARY_PATH="${CUDA_PATH}/lib" ldd out/run_tests_gpu_indirect
        linux-vdso.so.1 (0x00007ffd8ec76000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f472fbad000)
        liblapack.so.3 => /usr/lib/x86_64-linux-gnu/liblapack.so.3 (0x00007f472f509000)
        libblas.so.3 => /usr/lib/x86_64-linux-gnu/libblas.so.3 (0x00007f472f49c000)
        libcudart.so.10.1 => /local/data/zz1594/.julia/artifacts/f049c2824a217dc29dbf657e5cdf0f8adafca77a/lib/libcudart.so.10.1 (0x00007f472f220000)
        libcublas.so.10 => /local/data/zz1594/.julia/artifacts/f049c2824a217dc29dbf657e5cdf0f8adafca77a/lib/libcublas.so.10 (0x00007f472b47e000)
        libcusparse.so.10 => /local/data/zz1594/.julia/artifacts/f049c2824a217dc29dbf657e5cdf0f8adafca77a/lib/libcusparse.so.10 (0x00007f47241f5000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f4724003000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f472fd75000)
        libgfortran.so.5 => /usr/lib/x86_64-linux-gnu/libgfortran.so.5 (0x00007f4723d3b000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f4723d20000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f4723d1a000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f4723cf7000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f4723ceb000)
        libcublasLt.so.10 => /local/data/zz1594/.julia/artifacts/f049c2824a217dc29dbf657e5cdf0f8adafca77a/lib/libcublasLt.so.10 (0x00007f4721e47000)
        libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f4721c65000)
        libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007f4721c1b000)
  • however if I try to link against julia provided OpenBLAS with
BLASLDFLAGS="-L${JULIA_BLAS_PATH} -lopenblas64_"

make purge
make -j4 $SCSFLAGS BLASSUFFIX="_64_" BLAS64=1 DLONG=0 BLASLDFLAGS="${BLASLDFLAGS}" test_gpu

which results in

LD_LIBRARY_PATH="${JULIA_BLAS_PATH}" ldd out/run_tests_gpu_indirect
        linux-vdso.so.1 (0x00007ffd2f1bb000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f0dd6654000)
        libopenblas64_.so => /local/data/zz1594/julia-1.7.2/lib/julia/libopenblas64_.so (0x00007f0dd48fc000)
        libcudart.so.10.1 => /usr/lib/x86_64-linux-gnu/libcudart.so.10.1 (0x00007f0dd4680000)
        libcublas.so.10 => /usr/lib/x86_64-linux-gnu/libcublas.so.10 (0x00007f0dd08e5000)
        libcusparse.so.10 => /usr/lib/x86_64-linux-gnu/libcusparse.so.10 (0x00007f0dc965e000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f0dc946a000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f0dd681c000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f0dc9447000)
        libgfortran.so.5 => /local/data/zz1594/julia-1.7.2/lib/julia/libgfortran.so.5 (0x00007f0dc918c000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f0dc9186000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f0dc917c000)
        libcublasLt.so.10 => /usr/lib/x86_64-linux-gnu/libcublasLt.so.10 (0x00007f0dc72d8000)
        libstdc++.so.6 => /local/data/zz1594/julia-1.7.2/lib/julia/libstdc++.so.6 (0x00007f0dc70c2000)
        libgcc_s.so.1 => /local/data/zz1594/julia-1.7.2/lib/julia/libgcc_s.so.1 (0x00007f0dc70a7000)
        libquadmath.so.0 => /local/data/zz1594/julia-1.7.2/lib/julia/libquadmath.so.0 (0x00007f0dc705e000)

I get a failure:

*********************************************************
Running test: hs21_tiny_qp
------------------------------------------------------------------
               SCS v3.2.1 - Splitting Conic Solver
        (c) Brendan O'Donoghue, Stanford University, 2012
------------------------------------------------------------------
problem:  variables n: 2, constraints m: 4
cones:    b: box cone vars: 4
settings: eps_abs: 1.0e-06, eps_rel: 1.0e-06, eps_infeas: 1.0e-09
          alpha: 1.50, scale: 1.00e-01, adaptive_scale: 1
          max_iters: 100000, normalize: 1, rho_x: 1.00e-06
          acceleration_lookback: 10, acceleration_interval: 10
lin-sys:  sparse-indirect GPU
          nnz(A): 4, nnz(P): 2
------------------------------------------------------------------
 iter | pri res | dua res |   gap   |   obj   |  scale  | time (s)
------------------------------------------------------------------
     0| 9.61e-01  1.17e-01  1.96e-01  9.80e-02  1.00e-01  4.95e-04 
    25| 4.08e-04  4.78e-02  1.14e-01  6.94e-18  1.00e-01  4.21e-03 
------------------------------------------------------------------
status:  infeasible
timings: total: 4.22e-03s = setup: 4.24e-04s + solve: 3.79e-03s
         lin-sys: 3.70e-03s, cones: 3.82e-06s, accel: 1.08e-06s
------------------------------------------------------------------
objective = inf
------------------------------------------------------------------
primal obj error  inf
dual obj error  inf
hs21_tiny_qp: SCS failed to produce outputflag SCS_SOLVED
Tests run: 6
  • similarly built run_tests_[in]direct pass all tests just fine

kalmarek avatar Apr 19 '22 09:04 kalmarek

Hmmm, if the blas you're using is 64 bit it might be tricky to get everything to work with a GPU which (usually) expects 32 bit integers.

bodono avatar Apr 20 '22 09:04 bodono

hmm, precisely the same problem happens if I compile with

BLASLDFLAGS="-L${JULIA_BLAS_PATH} -lopenblas"
SCSFLAGS="USE_OPENMP=0 BLAS32=1 DLONG=0"

make purge
CUDA_PATH="${CUDA_PATH}" make -j4 $SCSFLAGS BLASLDFLAGS="${BLASLDFLAGS}" test_gpu

here is a gist from build, tests and ldd. https://gist.github.com/kalmarek/0bb320b84871351bff1bb796e516c4a7

OpenBLAS is the LP64 version (integers are ints)

kalmarek avatar Apr 22 '22 13:04 kalmarek

Looks like the tests are passing except for hs21, which is probably just because the numerics are slightly different on the GPU and it's producing a bad flag.

bodono avatar Apr 25 '22 11:04 bodono

@bodono could you have a look at this problem: https://cloud.impan.pl/s/MX5oBX0lHb5LJl2

It's the same problem that you obtain through this code:

let T = SCS.GpuIndirectSolver
    A = [
        1.0 1.0 0.0 0.0 0.0
        0.0 1.0 0.0 0.0 1.0
        0.0 0.0 1.0 1.0 1.0
        -1.0 0.0 0.0 0.0 0.0
        0.0 -1.0 0.0 0.0 0.0
        0.0 0.0 -1.0 0.0 0.0
        0.0 0.0 0.0 -1.0 0.0
        0.0 0.0 0.0 0.0 -1.0
    ]
    m, n = Int32.(size(A))
    args = (
        m = m,
        n = n,
        A = A,
        P = zeros(n, n),
        b = [5.0, 3.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0],
        c = -[3.0, 4.0, 4.0, 9.0, 5.0],
        z = 0,
        l = 8,
        bu = Float64[],
        bl = Float64[],
        q = Int32[],
        s = Int32[],
        ep = 0,
        ed = 0,
        p = Float64[],
    )
    solution = SCS.scs_solve(T, args..., max_iters=200, write_data_filename="simple_problem.scs")
    @test isapprox(solution.x' * args.c, -99.0; rtol = 1e-4)
end

This is easily solvable by the (In)Direct solvers but fails with our julia bindings to the GPU solver. Maybe by inspecting it by hand (it's a binary which I have no idea how to digest) we can learn what goes wrong?

this is what I get here:

writing data to simple_problem.scs
------------------------------------------------------------------
               SCS v3.2.0 - Splitting Conic Solver
        (c) Brendan O'Donoghue, Stanford University, 2012
------------------------------------------------------------------
problem:  variables n: 5, constraints m: 8
cones:    l: linear vars: 8
settings: eps_abs: 1.0e-04, eps_rel: 1.0e-04, eps_infeas: 1.0e-07
          alpha: 1.50, scale: 1.00e-01, adaptive_scale: 1
          max_iters: 200, normalize: 1, rho_x: 1.00e-06
          acceleration_lookback: 10, acceleration_interval: 10
lin-sys:  sparse-indirect GPU
          nnz(A): 12, nnz(P): 0
------------------------------------------------------------------
 iter | pri res | dua res |   gap   |   obj   |  scale  | time (s)
------------------------------------------------------------------
     0| 1.26e+02  3.95e+00  1.22e+03 -6.94e+02  1.00e-01  7.87e-04 
Warning: tol = -1.000000 <= 0, likely compiled without setting INDIRECT flag.
[...]
Warning: tol = -1.000000 <= 0, likely compiled without setting INDIRECT flag.
   200|      nan       nan      -nan      -nan  1.00e-01  8.29e-01 
------------------------------------------------------------------
status:  unbounded (inaccurate - reached max_iters)
timings: total: 8.81e-01s = setup: 5.27e-02s + solve: 8.29e-01s
         lin-sys: 8.26e-01s, cones: 2.52e-05s, accel: 6.92e-04s
------------------------------------------------------------------
objective = -inf (inaccurate)
------------------------------------------------------------------

kalmarek avatar Nov 03 '22 10:11 kalmarek

Did you compile with the INDIRECT flag?

bodono avatar Nov 03 '22 10:11 bodono

this is the script I use to compile scs

script = raw"""
cd $WORKSPACE/srcdir/scs*
flags="DLONG=0 BLAS32=1 USE_OPENMP=0 INDIRECT=1"
blasldflags="-L${libdir} -lopenblas"

CUDA_PATH=$prefix/cuda make BLASLDFLAGS="${blasldflags}" ${flags} out/libscsgpuindir.${dlext}

mkdir -p ${libdir}
cp out/libscs*.${dlext} ${libdir}
"""

kalmarek avatar Nov 03 '22 13:11 kalmarek

DINDIRECT=1 results in the same log

kalmarek avatar Nov 03 '22 13:11 kalmarek

The error message Warning: tol = -1.000000 <= 0, likely compiled without setting INDIRECT flag. should only appear if the INDIRECT flag is not set during compilation.

When the INDIRECT flag is set SCS does the additional computation to generate a good warm-start and a sensible tolerance for the indirect system:

https://github.com/cvxgrp/scs/blob/f2da64d314d86a97ebb8e957f215f27f9e2a7b79/src/scs.c#L366

Otherwise the tolerance is set to -1.0, which is an invalid tolerance: https://github.com/cvxgrp/scs/blob/f2da64d314d86a97ebb8e957f215f27f9e2a7b79/src/scs.c#L361

And that trips a warning from the indirect system solvers (should probably error out): https://github.com/cvxgrp/scs/blob/8ca03771f0cc7c25697b3e21d28788a2f8ce0fc6/linsys/gpu/indirect/private.c#L474

When that flag is not set SCS skips that computation for speed.

bodono avatar Nov 03 '22 15:11 bodono

Hmmm, actually this is likely something to do with the GPU solver specifically. There is some issue in there that only trips on some GPUs that I have run into before. It's probably something to do with type sizes that I have not been able to figure out. I would probably recommend shelving the GPU solver for now, the MKL one is typically faster anyway.

bodono avatar Nov 03 '22 15:11 bodono

Try the following patch. I got all the tests to pass with this fix.

--- a/linsys/gpu/gpu.c
+++ b/linsys/gpu/gpu.c
@@ -19,13 +19,13 @@ void SCS(accum_by_atrans_gpu)(const ScsGpuMatrix *Ag,
     if (*buffer != SCS_NULL) {
       cudaFree(*buffer);
     }
-    cudaMalloc(buffer, *buffer_size);
+    cudaMalloc(buffer, new_buffer_size);
     *buffer_size = new_buffer_size;
   }

   CUSPARSE_GEN(SpMV)
   (cusparse_handle, CUSPARSE_OPERATION_NON_TRANSPOSE, &onef, Ag->descr, x,
-   &onef, y, SCS_CUDA_FLOAT, SCS_CSRMV_ALG, buffer);
+   &onef, y, SCS_CUDA_FLOAT, SCS_CSRMV_ALG, *buffer);
 }

 /* this is slow, use trans routine if possible */
@@ -48,13 +48,13 @@ void SCS(accum_by_a_gpu)(const ScsGpuMatrix *Ag, const cusparseDnVecDescr_t x,
     if (*buffer != SCS_NULL) {
       cudaFree(*buffer);
     }
-    cudaMalloc(buffer, *buffer_size);
+    cudaMalloc(buffer, new_buffer_size);
     *buffer_size = new_buffer_size;
   }

   CUSPARSE_GEN(SpMV)
   (cusparse_handle, CUSPARSE_OPERATION_TRANSPOSE, &onef, Ag->descr, x, &onef, y,
-   SCS_CUDA_FLOAT, SCS_CSRMV_ALG, buffer);
+   SCS_CUDA_FLOAT, SCS_CSRMV_ALG, *buffer);
 }

 /* This assumes that P has been made full (ie not triangular) and uses the

syockit avatar Apr 03 '23 07:04 syockit

@syockit Thanks for this! I applied the patch and it worked! Do you want to turn this into a PR?

The only problem I had was an erroneous 'infeasible' certificate on hs21_tiny_qp and hs21_tiny_qp_rw tests. Do you get that too? I was able to get it to pass by tightening the eps_infeas tolerance in those files so if you have that problem too we can just do that.

bodono avatar Apr 03 '23 13:04 bodono

@bodono It's a hassle for me to set up a fork right now, so please apply the commit on your side.

You're right, I got the same infeasible certificate on the tests you mentioned. I missed that yesterday. And tightening eps_infeas did make it feasible.

syockit avatar Apr 03 '23 23:04 syockit

Sure, no problem @syockit , thanks for sending in the patch!

bodono avatar Apr 04 '23 07:04 bodono

  • the issue mentioned in https://github.com/cvxgrp/scs/issues/180#issuecomment-1301895062 seems to be solved by #251
  • I can not reproduce the original issue anymore (probably solved by #246).

I presume this issue can be closed after #251 is merged

kalmarek avatar Apr 12 '23 22:04 kalmarek