jittor
jittor copied to clipboard
RuntimeError when calling jt.unique in cuda
Description
While using jittor in a docker environment on A100 GPU server, there is a runtime error in calling jt.unique
Full Log
[i 0628 12:51:52.026463 96 compiler.py:951] Jittor(1.3.4.4) src: /opt/miniconda/lib/python3.7/site-packages/jittor
[i 0628 12:51:52.030702 96 compiler.py:952] g++ at /usr/bin/g++(7.5.0)
[i 0628 12:51:52.030859 96 compiler.py:953] cache_path: /root/.cache/jittor/jt1.3.4/g++7.5.0/py3.7.9/Linux-5.4.0-81x73/AMDEPYC774264-xaf/default
[i 0628 12:51:52.036395 96 __init__.py:411] Found /usr/local/cuda/bin/nvcc(10.2.89) at /usr/local/cuda/bin/nvcc.
[i 0628 12:51:52.041496 96 __init__.py:411] Found addr2line(2.30) at /usr/bin/addr2line.
[i 0628 12:51:52.467900 96 compiler.py:1006] cuda key:cu10.2.89_sm_80
[i 0628 12:51:52.814636 96 __init__.py:227] Total mem: 1007.70GB, using 16 procs for compiling.
[i 0628 12:51:54.003720 96 jit_compiler.cc:28] Load cc_path: /usr/bin/g++
[i 0628 12:51:56.031813 96 init.cc:62] Found cuda archs: [80,]
[w 0628 12:51:56.228352 96 compiler.py:1356] CUDA arch(80)>75 will be backward-compatible
[i 0628 12:51:57.074075 96 __init__.py:411] Found mpicc(3.1.2) at /usr/local/bin/mpicc.
[i 0628 12:51:57.418543 96 compile_extern.py:30] found /usr/local/cuda/include/cublas.h
[i 0628 12:51:57.441918 96 compile_extern.py:30] found /usr/local/cuda/lib64/libcublas.so
[i 0628 12:51:57.442159 96 compile_extern.py:30] found /usr/local/cuda/lib64/libcublasLt.so.10
[i 0628 12:53:47.091725 96 compile_extern.py:30] found /usr/include/cudnn.h
[i 0628 12:53:47.121692 96 compile_extern.py:30] found /usr/lib/x86_64-linux-gnu/libcudnn.so
[i 0628 12:55:31.574456 96 compile_extern.py:30] found /usr/local/cuda/include/curand.h
[i 0628 12:55:31.641044 96 compile_extern.py:30] found /usr/local/cuda/lib64/libcurand.so
[i 0628 12:55:31.727433 96 compile_extern.py:30] found /usr/local/cuda/include/cufft.h
[i 0628 12:55:31.784981 96 compile_extern.py:30] found /usr/local/cuda/lib64/libcufft.so
[i 0628 12:55:31.889927 96 cuda_flags.cc:32] CUDA enabled.
When calling jt.unique()
, it returens:
Exception has occurred: RuntimeError
[38;5;1m[f 0628 12:36:06.324679 20 executor.cc:661]
Execute fused operator(672/734) failed.
[JIT Source]: /root/.cache/jittor/jt1.3.4/g++7.5.0/py3.7.9/Linux-5.4.0-81x73/AMDEPYC774264-xaf/default/cu10.2.89_sm_80/jit/cutt_transpose_T_1__JIT_1__JIT_cuda_1__index_t_int32__hash_6fb2cc42cc1e932f_op.cc
[OP TYPE]: cutt_transpose
[Input]: float32[2,29,256,512,],
[Output]: float32[2,256,512,29,],
[Async Backtrace]: not found, please set env JT_SYNC=1, trace_py_var=3
[Reason]: cudaFuncSetSharedMemConfig(transposePacked<float, 1>, cudaSharedMemBankSizeFourByte ) in file /root/.cache/jittor/cutt/cutt-1.2/src/calls.h:2, function cuttKernelSetSharedMemConfig
Error message: invalid device function [m
Async error was detected. To locate the async backtrace and get better error report, please rerun your code with two enviroment variables set:
>>> export JT_SYNC=1
>>> export trace_py_var=3
File "/root/codes/trainer.py", line 121, in run
all_classes = jt.unique(target_map)
When I set:
>>> export JT_SYNC=1
>>> export trace_py_var=3
The log changed to:
Exception has occurred: RuntimeError
[38;5;1m[f 0628 13:03:34.941187 24 executor.cc:661]
Execute fused operator(13/14) failed.
[JIT Source]: /root/.cache/jittor/jt1.3.4/g++7.5.0/py3.7.9/Linux-5.4.0-81x73/AMDEPYC774264-xaf/default/cu10.2.89_sm_80/jit/cub_where_Ti_bool__To_int32__NDIM_1__JIT_1__JIT_cuda_1__index_t_int32__hash_4ac929b461bb89b6_op.cc
[OP TYPE]: cub_where
[Input]: bool[393215,],
[Output]: int32[-393215,],
[Async Backtrace]: ---
/opt/miniconda/lib/python3.7/runpy.py:193 <_run_module_as_main>
/opt/miniconda/lib/python3.7/runpy.py:85 <_run_code>
/root/.vscode-server/extensions/ms-python.python-2022.8.0/pythonFiles/lib/python/debugpy/__main__.py:45 <<module>>
/root/.vscode-server/extensions/ms-python.python-2022.8.0/pythonFiles/lib/python/debugpy/../debugpy/server/cli.py:444 <main>
/root/.vscode-server/extensions/ms-python.python-2022.8.0/pythonFiles/lib/python/debugpy/../debugpy/server/cli.py:285 <run_file>
/opt/miniconda/lib/python3.7/runpy.py:263 <run_path>
/opt/miniconda/lib/python3.7/runpy.py:96 <_run_module_code>
/opt/miniconda/lib/python3.7/runpy.py:85 <_run_code>
main.py:14 <<module>>
/opt/miniconda/lib/python3.7/site-packages/jittor/misc.py:539 <unique>
/opt/miniconda/lib/python3.7/site-packages/jittor/contrib.py:183 <getitem>
[Reason]: [38;5;1m[f 0628 13:03:34.821809 24 helper_cuda.h:128] CUDA error at /root/.cache/jittor/jt1.3.4/g++7.5.0/py3.7.9/Linux-5.4.0-81x73/AMDEPYC774264-xaf/default/cu10.2.89_sm_80/jit/cub_where_Ti_bool__To_int32__NDIM_1__JIT_1__JIT_cuda_1__index_t_int32__hash_4ac929b461bb89b6_op.cc:68 code=98( cudaErrorInvalidDeviceFunction ) cub::DeviceSelect::Flagged(nullptr, temp_storage_bytes, counting_itr, itr, out_temp, (To*)num_nonzeros, N)[m[m
File "/root/codes/main.py", line 14, in <module>
jt.unique(x)
Implemented Environments
Docker Image: leoxiao/openmpi3.1.2-cuda10.2-cudnn8-ubuntu18.04:pt1.8.1-lts
Jittor: jittor==1.3.4.4
Device: A100 GPUs
Minimal Reproduce
import jittor as jt
jt.flags.use_cuda = 1
x = jt.randint(0, 10, (2, 3, 256, 256))
jt.unique(x)
Expected behavior
I want to find a proper docker image to run jittor in cuda. However, the official docker image runs very slow on A100 GPUs.