SATNet icon indicating copy to clipboard operation
SATNet copied to clipboard

Failed on running Suodku example with CUDA

Open bywbilly opened this issue 4 years ago • 8 comments

Thanks for this project and it is awesome! I installed satnet through pip. But when I run the visual-sudoku examples, this error occurs: image

bywbilly avatar Oct 30 '20 17:10 bywbilly

It seems that the CUDA module is not compiled. What version of PyTorch are you using? SATNet currently only supports pytorch==1.1.0. The CPP API changed in a later version, and I haven't fixed it yet.

xflash96 avatar Oct 31 '20 05:10 xflash96

This errors occurs for me too. I'm running PyTorch 1.1.0.

EDIT: Is there any specific version of CUDA perhaps that is needed?

JellePiepenbrock avatar Nov 05 '20 12:11 JellePiepenbrock

The following piece of code is causing the CUDA extension to not be compiled:

In my case, CUDA_HOME is indeed None, so this piece is skipped:

if torch.cuda.is_available() and CUDA_HOME is not None:
    extension = CUDAExtension(
        name = 'satnet._cuda',
        include_dirs = ['./src'],
        sources = [
            'src/satnet.cpp',
            'src/satnet_cuda.cu',
        ],
        extra_compile_args = {
            'cxx': ['-DMIX_USE_GPU', '-g'],
            'nvcc': ['-g', '-restrict', '-maxrregcount', '32', '-lineinfo', '-Xptxas=-v']
        }
    )
    ext_modules.append(extension)

Now, I think this is because the conda / pip version of cudatoolkit is not the entire toolkit, only the parts needed for standard use of PyTorch/TF. The extra compile arguments for nvcc for example will cause an error too, because nvcc is not in the Conda version of cudatoolkit.

@xflash96 , could you confirm you are not using a conda or pip (or similar) install of cudatoolkit?

JellePiepenbrock avatar Nov 06 '20 11:11 JellePiepenbrock

Thanks for the info. I am PyTorch 1.1.0. I found one thing interesting. The speed running on colab (with one GPU) is slower than the speed running on my serve (64 cores). I don't know whether this is normal?

bywbilly avatar Nov 06 '20 18:11 bywbilly

Could you describe what you did to get your installation working?

Regarding the speed; I think Colab only gives you 2 CPU cores, so that could slow things down quite a bit.

JellePiepenbrock avatar Nov 06 '20 18:11 JellePiepenbrock

Could you describe what you did to get your installation working?

Regarding the speed; I think Colab only gives you 2 CPU cores, so that could slow things down quite a bit.

I didn't make it work on my server with CUDA. So given the fact that my instances are not that large, so I use CPU to finish the training and testing.

bywbilly avatar Nov 06 '20 18:11 bywbilly

I'm using the cudatoolkit that comes from PyTorch's official docker file: pytorch/pytorch:1.1.0-cuda10.0-cudnn7.5-devel. The "maxrregcount" argument is needed for older GPUs because NVCC may overspill the register if not set properly... I'll take a look at the newer version of the toolkit to see if the argument can be removed.

On Fri, Nov 6, 2020 at 6:50 AM JellePiepenbrock [email protected] wrote:

The following piece of code is causing the CUDA extension to not be compiled:

In my case, CUDA_HOME is indeed None, so this piece is skipped:

if torch.cuda.is_available() and CUDA_HOME is not None: extension = CUDAExtension( name = 'satnet._cuda', include_dirs = ['./src'], sources = [ 'src/satnet.cpp', 'src/satnet_cuda.cu', ], extra_compile_args = { 'cxx': ['-DMIX_USE_GPU', '-g'], 'nvcc': ['-g', '-restrict', '-maxrregcount', '32', '-lineinfo', '-Xptxas=-v'] } ) ext_modules.append(extension)

Now, I think this is because the conda / pip version of cudatoolkit is not the entire toolkit, only the parts needed for standard use of PyTorch/TF. The extra compile arguments for nvcc for example will cause an error too, because nvcc is not in the Conda version of cudatoolkit.

@xflash96 https://github.com/xflash96 , could you confirm you are not using a conda or pip (or similar) install of cudatoolkit?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/locuslab/SATNet/issues/9#issuecomment-723039807, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGNNENDKYUXH6RDIIS7GFDSOPPIBANCNFSM4TFHW7YQ .

xflash96 avatar Nov 06 '20 19:11 xflash96

Confirmed. NVCC is required for custom CUDA extensions, and the "maxrrregcount" flag is also needed to work on Colab. For NVCC, it can be installed via $conda install -c conda-forge cudatoolkit-dev I've added the instruction on the README.md. (BTW, I've also updated the APIs to match with pytorch:1.7.0).

xflash96 avatar Dec 12 '20 06:12 xflash96