DiffRL icon indicating copy to clipboard operation
DiffRL copied to clipboard

Error when python test_env.py --env AntEnv

Open HzfFrank opened this issue 2 years ago • 7 comments

Excuse me, I met such problem when I try the command python test_env.py --env AntEnv in the folder examples as the guide The version of my Pytorch is 1.11.0, cuda is 12.1 Is there anything wrong with my system? I'll appreciate it a lot if you can help me with this problem.

Rebuilding kernels
Detected CUDA files, patching ldflags
Emitting ninja build file /home/frank/DiffRL/dflex/dflex/kernels/build.ninja...
Building extension module kernels...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/3] /usr/local/cuda-12.1/bin/nvcc  -DTORCH_EXTENSION_NAME=kernels -DTORCH_API_INCLUDE_EXTENSION_H -
DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -
I/home/frank/DiffRL/dflex/dflex -isystem /home/frank/anaconda3/envs/shac/lib/python3.8/site-packages/torch/include -isystem 
/home/frank/anaconda3/envs/shac/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem 
/home/frank/anaconda3/envs/shac/lib/python3.8/site-packages/torch/include/TH -isystem 
/home/frank/anaconda3/envs/shac/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-12.1/include -isystem 
/home/frank/anaconda3/envs/shac/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -
D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-
relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-
fPIC' -gencode=arch=compute_35,code=compute_35 -std=c++14 -c /home/frank/DiffRL/dflex/dflex/kernels/cuda.cu -o cuda.cuda.o
FAILED: cuda.cuda.o
/usr/local/cuda-12.1/bin/nvcc  -DTORCH_EXTENSION_NAME=kernels -DTORCH_API_INCLUDE_EXTENSION_H -
DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -
I/home/frank/DiffRL/dflex/dflex -isystem /home/frank/anaconda3/envs/shac/lib/python3.8/site-packages/torch/include -isystem 
/home/frank/anaconda3/envs/shac/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem 
/home/frank/anaconda3/envs/shac/lib/python3.8/site-packages/torch/include/TH -isystem 
/home/frank/anaconda3/envs/shac/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-12.1/include -isystem 
/home/frank/anaconda3/envs/shac/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -
D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-
relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-
fPIC' -gencode=arch=compute_35,code=compute_35 -std=c++14 -c /home/frank/DiffRL/dflex/dflex/kernels/cuda.cu -o cuda.cuda.o
nvcc fatal   : Unsupported gpu architecture 'compute_35'
[2/3] c++ -MMD -MF main.o.d -DTORCH_EXTENSION_NAME=kernels -DTORCH_API_INCLUDE_EXTENSION_H -
DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -
I/home/frank/DiffRL/dflex/dflex -isystem /home/frank/anaconda3/envs/shac/lib/python3.8/site-packages/torch/include -isystem 
/home/frank/anaconda3/envs/shac/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem 
/home/frank/anaconda3/envs/shac/lib/python3.8/site-packages/torch/include/TH -isystem 
/home/frank/anaconda3/envs/shac/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-12.1/include -isystem 
/home/frank/anaconda3/envs/shac/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -Z -O2 -DNDEBUG -c 
/home/frank/DiffRL/dflex/dflex/kernels/main.cpp -o main.o
/home/frank/DiffRL/dflex/dflex/kernels/main.cpp: In function ‘df::float3 box_sdf_grad_cpu_func(df::float3, df::float3)’:
/home/frank/DiffRL/dflex/dflex/kernels/main.cpp:1051:47: warning: control reaches end of non-void function [-Wreturn-type]
 1051 |     var_58 = df::select(var_56, var_53, var_57);
          |
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "/home/frank/anaconda3/envs/shac/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1740, in _run_ninja_build
    subprocess.run(
  File "/home/frank/anaconda3/envs/shac/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "test_env.py", line 17, in <module>                                                                                                                                                       
    import envs
  File "/home/frank/DiffRL/envs/__init__.py", line 8, in <module>                                                                                                                                        
    from envs.dflex_env import DFlexEnv                                                                                                                                                        
  File "/home/frank/DiffRL/envs/dflex_env.py", line 15, in <module>                                                                                                                              
    import dflex as df                                                                                                                                                                         
  File "/home/frank/DiffRL/dflex/dflex/__init__.py", line 15, in <module>                                                                                                                            
    kernel_init()                                                                                                                                                                              
  File "/home/frank/DiffRL/dflex/dflex/sim.py", line 67, in kernel_init                                                                                                                          
    kernels = df.compile()                                                                                                                                                                     
  File "/home/frank/DiffRL/dflex/dflex/adjoint.py", line 1865, in compile                                                                                                                        
    module = torch.utils.cpp_extension.load_inline('kernels',                                                                                                                                  
  File "/home/frank/anaconda3/envs/shac/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1293, in load_inline                                                                     
    return _jit_compile(                                                                                                                                                                       
  File "/home/frank/anaconda3/envs/shac/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1357, in _jit_compile                                                                    
    _write_ninja_file_and_build_library(                                                                                                                                                       
  File "/home/frank/anaconda3/envs/shac/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1469, in _write_ninja_file_and_build_library
    _run_ninja_build(                                                                                                                                                                          
  File "/home/frank/anaconda3/envs/shac/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1756, in _run_ninja_build                                                                
    raise RuntimeError(message) from e
RuntimeError: Error building extension 'kernels'

HzfFrank avatar Mar 29 '23 15:03 HzfFrank

I solved it after I changed to use cuda 11.7, maybe this project doesn't support the latest version of cuda, if someone can run it on the latest version of cuda, I'll appreciate it a lot if you can share it

HzfFrank avatar Mar 30 '23 11:03 HzfFrank

I meet the same case : ( My GPU is RTX4090, with cuda 12.1. I could not solve this problem : (

wangrun20 avatar May 07 '23 11:05 wangrun20

Similar error when running python -c "import dflex" after installation. RTX 4090 with cuda 11.6. Btw, I also failed to build dflex on A100.

UltronAI avatar Jun 16 '23 16:06 UltronAI

After changing my cuda to 11.7, the problem still exists. RTX 3060 with cuda 11.7, pytorch 1.11.0

Leon-LXA avatar Oct 12 '23 17:10 Leon-LXA

same issue here window11 CUDA12.2 python3.8 torch2.2.0

YuehChuan avatar Nov 15 '23 03:11 YuehChuan