sru icon indicating copy to clipboard operation
sru copied to clipboard

CUDA 9 Support

Open calclavia opened this issue 6 years ago • 12 comments

Does the existing codebase work with CUDA 9 as well? On the Titan V GPU we could take advantage of FP16 to further speed up training.

calclavia avatar Jun 01 '18 04:06 calclavia

Hi, the CUDA implementation assumes float32 inputs and outputs. Need to re-write the implementation as templates in order to support various float types.

taolei87 avatar Jun 02 '18 20:06 taolei87

@taolei87 would it be possible to easily add support CUDA 9 (with no FP16 support)? I'm trying to install this on a system that has CUDA 9 installed using pip and the installer complains about the CUDA environment due to the cupy dependency.

calclavia avatar Jun 06 '18 03:06 calclavia

@calclavia

I've been using CUDA 9 (with FP32) and it was fine. Didn't the default pip install work ? pip install cupy-cuda90

cupy supports CUDA 9 already: https://cupy.chainer.org/

taolei87 avatar Jun 06 '18 18:06 taolei87

@taolei87 I was trying to use the command given in the README python setup.py install. This doesn't seem to install CUDA9. Would I need to install dependencies manually for CUDA 9?

calclavia avatar Jun 06 '18 18:06 calclavia

ah yeah. requirements.txt doesn't include CUDA dependency.

could you try to install CUDA9 and cupy-cuda90 manually?

taolei87 avatar Jun 06 '18 18:06 taolei87

@taolei87 I performed the following steps:

# 1. Clone the repo
# 2. Change the requirements txt
printf "cupy-cuda91\npynvrtc\n" > requirements.txt
# 3. Install module
python3 setup.py install

Then, I open Python interactive terminal and typed import sru. Getting this error:

(Note that I am able to train regular Pytorch models on CUDA in this environment, so my CUDA installation should be correct.)

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/cupy/__init__.py", line 11, in <module>
    from cupy import core  # NOQA
  File "/usr/local/lib/python3.5/dist-packages/cupy/core/__init__.py", line 1, in <module>
    from cupy.core import core  # NOQA
ImportError: libcublas.so.9.1: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.5/dist-packages/sru-0.0.1-py3.5.egg/sru/__init__.py", line 2, in <module>
    from .cuda_functional import *
  File "/usr/local/lib/python3.5/dist-packages/sru-0.0.1-py3.5.egg/sru/cuda_functional.py", line 11, in <module>
    from cupy.cuda import function
  File "/usr/local/lib/python3.5/dist-packages/cupy/__init__.py", line 32, in <module>
    six.reraise(ImportError, ImportError(msg), exc_info[2])
  File "/usr/local/lib/python3.5/dist-packages/six.py", line 692, in reraise
    raise value.with_traceback(tb)
  File "/usr/local/lib/python3.5/dist-packages/cupy/__init__.py", line 11, in <module>
    from cupy import core  # NOQA
  File "/usr/local/lib/python3.5/dist-packages/cupy/core/__init__.py", line 1, in <module>
    from cupy.core import core  # NOQA
ImportError: CuPy is not correctly installed.

If you are using wheel distribution (cupy-cudaXX), make sure that the version of CuPy you installed matches with the version of CUDA on your host.
Also, confirm that only one CuPy package is installed:
  $ pip freeze

If you are building CuPy from source, please check your environment, uninstall CuPy and reinstall it with:
  $ pip install cupy --no-cache-dir -vvvv

Check the Installation Guide for details:
  https://docs-cupy.chainer.org/en/latest/install.html

original error: libcublas.so.9.1: cannot open shared object file: No such file or directory

Also, if it helps, I'm running this from NVIDIA's CUDA 9.1 docker container (image nvidia/cuda:9.1-base-ubuntu16.04), so it should include all CUDA deps

calclavia avatar Jun 06 '18 19:06 calclavia

This seems related to the FAQ (https://docs-cupy.chainer.org/en/stable/install.html)

Does cupy find the CUDA path successfully & correctly ?

libcublas.so.9.1 should lie in a directory of sth like /usr/local/cuda/lib64. Usually there is a symbollink that directs /usr/local/cuda to /usr/local/cuda-9.0

taolei87 avatar Jun 06 '18 19:06 taolei87

@taolei87 I managed to get it working. Needed to use the runtime image instead of the base image.

Now I'm running into a different error from pynvrtc. "OSError: libnvrtc.so: cannot open shared object file: No such file or directory" https://github.com/NVIDIA/pynvrtc/issues/3

Seems like pynvrtc does not work with CUDA 9. How did you get it working?

calclavia avatar Jun 06 '18 22:06 calclavia

I'm using CUDA 9.0 not 9.1. The versions I have: torch.version : 0.3.1 cupy.version : 4.1.0 pynvrtc.version : 8.0

taolei87 avatar Jun 07 '18 01:06 taolei87

Using cuda 9.1, along with cupy-cuda-91 v4.1.0 , gives still the same AttributeError: /usr/local/cuda/lib64/libnvrtc.so: undefined symbol: nvrtcAddNameExpression

yudai-patronai avatar Jun 20 '18 15:06 yudai-patronai

@taolei87 It's still an issue for CUDA 9.1. It throws a ImportError No module named 'cuda_functional'

RSKothari avatar Nov 13 '18 19:11 RSKothari

I'd like to say that I got a similar error, and hard-coded the source code as follows, and install from source. It works, I hope I am not misleading.

_SRU_PROG = Program(SRU_CODE, 'sru_prog.cu', lib_name='/opt/cuda/10.0/lib64/libnvrtc.so') # for pynvrtc >= 9.0 python setup.py install

ii-research-yu avatar Sep 17 '19 09:09 ii-research-yu