GPU tests failing
cc @kalmarek
Some examples have nan:
this is the last one that doesn't nan: https://buildkite.com/julialang/scs-dot-jl/builds/281#018ccf8a-baeb-409b-9fe5-fbde2f42e4bc
and the first one which nans https://buildkite.com/julialang/scs-dot-jl/builds/283#018ea03d-8b93-40f3-8e1b-6e6a37dec3c8
but this ci run was just after change to README (and before enabling openmp). Smells like something in the CUDA toolchain?!
There are quite a few versions changes so not sure what the culprit is.
The successful one uses
Installed CUDA_Driver_jll ── v0.7.0+1
Installed CUDA_Runtime_jll ─ v0.11.1+0
Installed SCS_GPU_jll ────── v3.2.4+0
The failing one does
Installed SCS_GPU_jll ────── v3.2.4+0
Installed CUDA_Driver_jll ── v0.8.0+0
Installed CUDA_Runtime_jll ─ v0.12.0+1
So this seems to be a problem with cuda-12? @maleadt (sorry if you get too many pings)
Upgrading CUDA_Runtime_jll only updates the underlying CUDA toolkit. Maybe your package is incompatible with the CUDA toolkit v12.4 as introduced by Runtime_jll 0.12, or needs a rebuild.
@maleadt It seems that the newest scs was already built against CUDA toolkit 12.4/5: https://buildkite.com/julialang/yggdrasil/builds/11739#01908495-78c0-45ae-8bf6-28205badd6b6
@bodono did you test scs with CUDA-12? some examples here run just fine (so I think we're interacting with the library correctly), but some end with bunch of nans.
Unfortunately if CUDA 12 is newish then it's likely that I have never tested with it, since I no longer have access to a GPU machine. The github action I have for gpus only compiles it.