SATNet
SATNet copied to clipboard
Failed on running Suodku example with CUDA
Thanks for this project and it is awesome! I installed satnet through pip. But when I run the visual-sudoku examples, this error occurs:
It seems that the CUDA module is not compiled. What version of PyTorch are you using? SATNet currently only supports pytorch==1.1.0. The CPP API changed in a later version, and I haven't fixed it yet.
This errors occurs for me too. I'm running PyTorch 1.1.0.
EDIT: Is there any specific version of CUDA perhaps that is needed?
The following piece of code is causing the CUDA extension to not be compiled:
In my case, CUDA_HOME is indeed None, so this piece is skipped:
if torch.cuda.is_available() and CUDA_HOME is not None:
extension = CUDAExtension(
name = 'satnet._cuda',
include_dirs = ['./src'],
sources = [
'src/satnet.cpp',
'src/satnet_cuda.cu',
],
extra_compile_args = {
'cxx': ['-DMIX_USE_GPU', '-g'],
'nvcc': ['-g', '-restrict', '-maxrregcount', '32', '-lineinfo', '-Xptxas=-v']
}
)
ext_modules.append(extension)
Now, I think this is because the conda / pip version of cudatoolkit is not the entire toolkit, only the parts needed for standard use of PyTorch/TF. The extra compile arguments for nvcc for example will cause an error too, because nvcc is not in the Conda version of cudatoolkit.
@xflash96 , could you confirm you are not using a conda or pip (or similar) install of cudatoolkit?
Thanks for the info. I am PyTorch 1.1.0. I found one thing interesting. The speed running on colab (with one GPU) is slower than the speed running on my serve (64 cores). I don't know whether this is normal?
Could you describe what you did to get your installation working?
Regarding the speed; I think Colab only gives you 2 CPU cores, so that could slow things down quite a bit.
Could you describe what you did to get your installation working?
Regarding the speed; I think Colab only gives you 2 CPU cores, so that could slow things down quite a bit.
I didn't make it work on my server with CUDA. So given the fact that my instances are not that large, so I use CPU to finish the training and testing.
I'm using the cudatoolkit that comes from PyTorch's official docker file: pytorch/pytorch:1.1.0-cuda10.0-cudnn7.5-devel. The "maxrregcount" argument is needed for older GPUs because NVCC may overspill the register if not set properly... I'll take a look at the newer version of the toolkit to see if the argument can be removed.
On Fri, Nov 6, 2020 at 6:50 AM JellePiepenbrock [email protected] wrote:
The following piece of code is causing the CUDA extension to not be compiled:
In my case, CUDA_HOME is indeed None, so this piece is skipped:
if torch.cuda.is_available() and CUDA_HOME is not None: extension = CUDAExtension( name = 'satnet._cuda', include_dirs = ['./src'], sources = [ 'src/satnet.cpp', 'src/satnet_cuda.cu', ], extra_compile_args = { 'cxx': ['-DMIX_USE_GPU', '-g'], 'nvcc': ['-g', '-restrict', '-maxrregcount', '32', '-lineinfo', '-Xptxas=-v'] } ) ext_modules.append(extension)
Now, I think this is because the conda / pip version of cudatoolkit is not the entire toolkit, only the parts needed for standard use of PyTorch/TF. The extra compile arguments for nvcc for example will cause an error too, because nvcc is not in the Conda version of cudatoolkit.
@xflash96 https://github.com/xflash96 , could you confirm you are not using a conda or pip (or similar) install of cudatoolkit?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/locuslab/SATNet/issues/9#issuecomment-723039807, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGNNENDKYUXH6RDIIS7GFDSOPPIBANCNFSM4TFHW7YQ .
Confirmed. NVCC is required for custom CUDA extensions, and the "maxrrregcount" flag is also needed to work on Colab. For NVCC, it can be installed via $conda install -c conda-forge cudatoolkit-dev I've added the instruction on the README.md. (BTW, I've also updated the APIs to match with pytorch:1.7.0).