jittor icon indicating copy to clipboard operation
jittor copied to clipboard

RuntimeError when calling jt.unique in cuda

Open Taited opened this issue 2 years ago • 0 comments

Description

While using jittor in a docker environment on A100 GPU server, there is a runtime error in calling jt.unique

Full Log

[i 0628 12:51:52.026463 96 compiler.py:951] Jittor(1.3.4.4) src: /opt/miniconda/lib/python3.7/site-packages/jittor
[i 0628 12:51:52.030702 96 compiler.py:952] g++ at /usr/bin/g++(7.5.0)
[i 0628 12:51:52.030859 96 compiler.py:953] cache_path: /root/.cache/jittor/jt1.3.4/g++7.5.0/py3.7.9/Linux-5.4.0-81x73/AMDEPYC774264-xaf/default
[i 0628 12:51:52.036395 96 __init__.py:411] Found /usr/local/cuda/bin/nvcc(10.2.89) at /usr/local/cuda/bin/nvcc.
[i 0628 12:51:52.041496 96 __init__.py:411] Found addr2line(2.30) at /usr/bin/addr2line.
[i 0628 12:51:52.467900 96 compiler.py:1006] cuda key:cu10.2.89_sm_80
[i 0628 12:51:52.814636 96 __init__.py:227] Total mem: 1007.70GB, using 16 procs for compiling.
[i 0628 12:51:54.003720 96 jit_compiler.cc:28] Load cc_path: /usr/bin/g++
[i 0628 12:51:56.031813 96 init.cc:62] Found cuda archs: [80,]
[w 0628 12:51:56.228352 96 compiler.py:1356] CUDA arch(80)>75 will be backward-compatible
[i 0628 12:51:57.074075 96 __init__.py:411] Found mpicc(3.1.2) at /usr/local/bin/mpicc.
[i 0628 12:51:57.418543 96 compile_extern.py:30] found /usr/local/cuda/include/cublas.h
[i 0628 12:51:57.441918 96 compile_extern.py:30] found /usr/local/cuda/lib64/libcublas.so
[i 0628 12:51:57.442159 96 compile_extern.py:30] found /usr/local/cuda/lib64/libcublasLt.so.10
[i 0628 12:53:47.091725 96 compile_extern.py:30] found /usr/include/cudnn.h
[i 0628 12:53:47.121692 96 compile_extern.py:30] found /usr/lib/x86_64-linux-gnu/libcudnn.so
[i 0628 12:55:31.574456 96 compile_extern.py:30] found /usr/local/cuda/include/curand.h
[i 0628 12:55:31.641044 96 compile_extern.py:30] found /usr/local/cuda/lib64/libcurand.so
[i 0628 12:55:31.727433 96 compile_extern.py:30] found /usr/local/cuda/include/cufft.h
[i 0628 12:55:31.784981 96 compile_extern.py:30] found /usr/local/cuda/lib64/libcufft.so
[i 0628 12:55:31.889927 96 cuda_flags.cc:32] CUDA enabled.

When calling jt.unique(), it returens:

Exception has occurred: RuntimeError
[38;5;1m[f 0628 12:36:06.324679 20 executor.cc:661] 
Execute fused operator(672/734) failed. 
[JIT Source]: /root/.cache/jittor/jt1.3.4/g++7.5.0/py3.7.9/Linux-5.4.0-81x73/AMDEPYC774264-xaf/default/cu10.2.89_sm_80/jit/cutt_transpose_T_1__JIT_1__JIT_cuda_1__index_t_int32__hash_6fb2cc42cc1e932f_op.cc 
[OP TYPE]: cutt_transpose 
[Input]: float32[2,29,256,512,], 
[Output]: float32[2,256,512,29,], 
[Async Backtrace]: not found, please set env JT_SYNC=1, trace_py_var=3 
[Reason]: cudaFuncSetSharedMemConfig(transposePacked<float, 1>, cudaSharedMemBankSizeFourByte ) in file /root/.cache/jittor/cutt/cutt-1.2/src/calls.h:2, function cuttKernelSetSharedMemConfig
Error message: invalid device function [m

Async error was detected. To locate the async backtrace and get better error report, please rerun your code with two enviroment variables set:
>>> export JT_SYNC=1
>>> export trace_py_var=3
  File "/root/codes/trainer.py", line 121, in run
    all_classes = jt.unique(target_map)

When I set:

>>> export JT_SYNC=1
>>> export trace_py_var=3

The log changed to:

Exception has occurred: RuntimeError
[38;5;1m[f 0628 13:03:34.941187 24 executor.cc:661] 
Execute fused operator(13/14) failed. 
[JIT Source]: /root/.cache/jittor/jt1.3.4/g++7.5.0/py3.7.9/Linux-5.4.0-81x73/AMDEPYC774264-xaf/default/cu10.2.89_sm_80/jit/cub_where_Ti_bool__To_int32__NDIM_1__JIT_1__JIT_cuda_1__index_t_int32__hash_4ac929b461bb89b6_op.cc 
[OP TYPE]: cub_where 
[Input]: bool[393215,], 
[Output]: int32[-393215,], 
[Async Backtrace]: --- 
     /opt/miniconda/lib/python3.7/runpy.py:193 <_run_module_as_main> 
     /opt/miniconda/lib/python3.7/runpy.py:85 <_run_code> 
     /root/.vscode-server/extensions/ms-python.python-2022.8.0/pythonFiles/lib/python/debugpy/__main__.py:45 <<module>> 
     /root/.vscode-server/extensions/ms-python.python-2022.8.0/pythonFiles/lib/python/debugpy/../debugpy/server/cli.py:444 <main> 
     /root/.vscode-server/extensions/ms-python.python-2022.8.0/pythonFiles/lib/python/debugpy/../debugpy/server/cli.py:285 <run_file> 
     /opt/miniconda/lib/python3.7/runpy.py:263 <run_path> 
     /opt/miniconda/lib/python3.7/runpy.py:96 <_run_module_code> 
     /opt/miniconda/lib/python3.7/runpy.py:85 <_run_code> 
     main.py:14 <<module>> 
     /opt/miniconda/lib/python3.7/site-packages/jittor/misc.py:539 <unique> 
     /opt/miniconda/lib/python3.7/site-packages/jittor/contrib.py:183 <getitem> 
[Reason]: [38;5;1m[f 0628 13:03:34.821809 24 helper_cuda.h:128] CUDA error at /root/.cache/jittor/jt1.3.4/g++7.5.0/py3.7.9/Linux-5.4.0-81x73/AMDEPYC774264-xaf/default/cu10.2.89_sm_80/jit/cub_where_Ti_bool__To_int32__NDIM_1__JIT_1__JIT_cuda_1__index_t_int32__hash_4ac929b461bb89b6_op.cc:68  code=98( cudaErrorInvalidDeviceFunction ) cub::DeviceSelect::Flagged(nullptr, temp_storage_bytes, counting_itr, itr, out_temp, (To*)num_nonzeros, N)[m[m
  File "/root/codes/main.py", line 14, in <module>
    jt.unique(x)

Implemented Environments

Docker Image: leoxiao/openmpi3.1.2-cuda10.2-cudnn8-ubuntu18.04:pt1.8.1-lts  

Jittor: jittor==1.3.4.4  

Device: A100 GPUs

Minimal Reproduce

import jittor as jt

jt.flags.use_cuda = 1  

x = jt.randint(0, 10, (2, 3, 256, 256))  

jt.unique(x)  

Expected behavior

I want to find a proper docker image to run jittor in cuda. However, the official docker image runs very slow on A100 GPUs.

Taited avatar Jun 28 '22 12:06 Taited