jittor icon indicating copy to clipboard operation
jittor copied to clipboard

Error String: invalid device function

Open DuinoDu opened this issue 4 years ago • 7 comments

python3.7 -m jittor.test.test_cutt_transpose_op throws this error. It's met in ssd-jittor.

DuinoDu avatar Mar 25 '20 00:03 DuinoDu

Thanks for your feedback, could you please provide more detailed information, such as the error logging? And nvidia-smi results?

Jittor avatar Mar 25 '20 03:03 Jittor

error logging:

[i 0326 09:05:17.686328 60 v1 __init__.py:102] Run cmd: git branch
fatal: Not a git repository (or any parent up to mount point /home)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
[i 0326 09:05:18.203590 84 __init__.py:183] Found gdb(7.6.1) at /bin/gdb.
[i 0326 09:05:18.217129 84 __init__.py:183] Found addr2line(5.1) at /bin/addr2line.
[i 0326 09:05:18.681350 84 compiler.py:802] pybind_include: -I/home/users/min.du/venvs/jittor/include/python3.7m -I/home/users/min.du/venvs/jittor/include
[i 0326 09:05:18.722916 84 compiler.py:804] extension_suffix: .cpython-37m-x86_64-linux-gnu.so
[i 0326 09:05:23.588546 84 jit_compiler.cc:20] Load cc_path: /home/users/min.du/opt/gcc-5.4.0/bin/g++
[i 0326 09:05:23.588579 84 jit_compiler.cc:23] Load nvcc_path: /usr/local/cuda-10.0/bin/nvcc
[i 0326 09:05:23.588873 84 cuda_flags.cc:19] CUDA disabled.
[i 0326 09:05:27.107895 84 compile_extern.py:14] found /usr/local/cuda-10.0/include/cublas.h
[i 0326 09:05:27.108071 84 compile_extern.py:14] found /usr/local/cuda-10.0/lib64/libcublas.so
[i 0326 09:05:27.546553 84 compile_extern.py:14] found /usr/local/cuda-10.0/include/cudnn.h
[i 0326 09:05:27.546660 84 compile_extern.py:14] found /usr/local/cuda-10.0/lib64/libcudnn.so
[i 0326 09:05:28.681775 84 compile_extern.py:14] found /usr/local/cuda-10.0/include/curand.h
[i 0326 09:05:28.681857 84 compile_extern.py:14] found /usr/local/cuda-10.0/lib64/libcurand.so
[i 0326 09:05:29.012195 84 cuda_flags.cc:17] CUDA enabled.
cudaFuncSetSharedMemConfig(transposePacked<float, 1>, cudaSharedMemBankSizeFourByte ) in file src/calls.h, function cuttKernelSetSharedMemConfig
Error String: invalid device function

nvidia-smi result:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.93       Driver Version: 410.93       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX TIT...  Off  | 00000000:02:00.0 Off |                  N/A |
| 22%   25C    P8    14W / 250W |   3841MiB / 12212MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX TIT...  Off  | 00000000:03:00.0 Off |                  N/A |
| 22%   28C    P8    16W / 250W |   3593MiB / 12212MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  GeForce GTX TIT...  Off  | 00000000:82:00.0 Off |                  N/A |
| 22%   27C    P8    15W / 250W |     11MiB / 12212MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  GeForce GTX TIT...  Off  | 00000000:83:00.0 Off |                  N/A |
| 22%   29C    P8    15W / 250W |     11MiB / 12212MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      3124      C   python3.7                                    104MiB |
|    0      6519      C   ...ransfer_gpu007/ENV/anaconda3/bin/python   104MiB |
|    0      7883      C   python3                                     3615MiB |
|    1      7883      C   python3                                     3575MiB |
|    1     12130      C   python3                                        2MiB |
+-----------------------------------------------------------------------------+

BTW, OS is centos.

DuinoDu avatar Mar 26 '20 01:03 DuinoDu

Hi DuinoDu, Thank you for reporting the error. centos is not supported yet, we are working on it.

diamond0910 avatar Mar 26 '20 03:03 diamond0910

Here is what happened. I remove ${HOME}/.cache/jittor and re-run python3.7 -m jittor.test.test_cutt_transpose_op. After waiting a-cup-of-coffee time, test passed !!!

(base) ➜  jittor git:(master) ✗ python3.7 -m jittor.test.test_cutt_transpose_op
[i 0326 11:04:37.614970 56 v1 __init__.py:102] Run cmd: git branch
fatal: Not a git repository (or any parent up to mount point /home)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
[i 0326 11:04:37.688707 56 __init__.py:183] Found gdb(7.6.1) at /bin/gdb.
[i 0326 11:04:37.701683 56 __init__.py:183] Found addr2line(5.1) at /bin/addr2line.
[i 0326 11:04:37.837388 56 compiler.py:802] pybind_include: -I/home/users/min.du/venvs/jittor/include/python3.7m -I/home/users/min.du/venvs/jittor/include
[i 0326 11:04:37.870565 56 compiler.py:804] extension_suffix: .cpython-37m-x86_64-linux-gnu.so
[i 0326 11:04:38.427794 56 jit_compiler.cc:20] Load cc_path: /home/users/min.du/opt/gcc-5.4.0/bin/g++
[i 0326 11:04:38.427818 56 jit_compiler.cc:23] Load nvcc_path: /usr/local/cuda-10.0/bin/nvcc
[i 0326 11:04:38.428010 56 cuda_flags.cc:19] CUDA disabled.
[i 0326 11:04:39.378795 56 compile_extern.py:14] found /usr/local/cuda-10.0/include/cublas.h
[i 0326 11:04:39.378978 56 compile_extern.py:14] found /usr/local/cuda-10.0/lib64/libcublas.so
[i 0326 11:04:39.804576 56 compile_extern.py:14] found /usr/local/cuda-10.0/include/cudnn.h
[i 0326 11:04:39.804678 56 compile_extern.py:14] found /usr/local/cuda-10.0/lib64/libcudnn.so
[i 0326 11:04:40.824909 56 compile_extern.py:14] found /usr/local/cuda-10.0/include/curand.h
[i 0326 11:04:40.824986 56 compile_extern.py:14] found /usr/local/cuda-10.0/lib64/libcurand.so
[i 0326 11:04:41.127469 56 cuda_flags.cc:17] CUDA enabled.
[i 0326 11:04:41.442841 56 cuda_flags.cc:19] CUDA disabled.
.[i 0326 11:04:41.443118 56 cuda_flags.cc:17] CUDA enabled.
[i 0326 11:04:41.497078 56 cuda_flags.cc:19] CUDA disabled.
.[i 0326 11:04:41.497277 56 cuda_flags.cc:17] CUDA enabled.
[i 0326 11:04:41.620295 56 cuda_flags.cc:19] CUDA disabled.
.
----------------------------------------------------------------------
Ran 3 tests in 0.493s

OK

And one more interesting thing, CUDA enabled repeate for three times.

So, emmmm, removing ${HOME}/.cache/jittor may be a problem-solving method in jittor.

DuinoDu avatar Mar 26 '20 03:03 DuinoDu

Thank you for trying jittor in centos, It surprised us with the correct results. we are working on supporting centos, some features under centos may not stable yet.

Jittor avatar Mar 26 '20 03:03 Jittor

I very much appreciate your work under centos, we will support centos very soon!

Jittor avatar Mar 26 '20 03:03 Jittor

hello! centos is now supported in jittor

Jittor avatar May 14 '21 07:05 Jittor