torch_efficient_distloss icon indicating copy to clipboard operation
torch_efficient_distloss copied to clipboard

RuntimeError: Error building extension 'segment_cumsum_cuda'

Open IaroslavS opened this issue 1 year ago • 2 comments

Hi ! I'm trying to run Block-NeRF and I faced this error:

Using /root/.cache/torch_extensions/py310_cu116 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /root/.cache/torch_extensions/py310_cu116/segment_cumsum_cuda/build.ninja...
Building extension module segment_cumsum_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/1] c++ segment_cumsum.o segment_cumsum_kernel.cuda.o -shared -L/opt/conda/lib/python3.10/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda_cu -ltorch_cuda_cpp -ltorch -ltorch_python -L/opt/conda/lib64 -lcudart -o segment_cumsum_cuda.so
FAILED: segment_cumsum_cuda.so 
c++ segment_cumsum.o segment_cumsum_kernel.cuda.o -shared -L/opt/conda/lib/python3.10/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda_cu -ltorch_cuda_cpp -ltorch -ltorch_python -L/opt/conda/lib64 -lcudart -o segment_cumsum_cuda.so
/usr/bin/ld: cannot find -lcudart
collect2: error: ld returned 1 exit status
ninja: build stopped: subcommand failed.
  0%|                                                                                                                                                                                                                                                    | 0/100000 [00:04<?, ?it/s]
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1900, in _run_ninja_build
    subprocess.run(
  File "/opt/conda/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/docker_block_nerf/Block_NeRF/run_FourierGrid.py", line 115, in <module>
    run_train(args, cfg, data_dict, export_cam=True, export_geometry=True)
  File "/home/docker_block_nerf/Block_NeRF/FourierGrid/run_train.py", line 382, in run_train
    psnr = scene_rep_reconstruction(
  File "/home/docker_block_nerf/Block_NeRF/FourierGrid/run_train.py", line 274, in scene_rep_reconstruction
    loss_distortion = flatten_eff_distloss(w, s, 1/n_max, ray_id)
  File "/opt/conda/lib/python3.10/site-packages/torch_efficient_distloss/eff_distloss.py", line 93, in forward
    segment_cumsum_cuda = load(
  File "/opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1284, in load
    return _jit_compile(
  File "/opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1508, in _jit_compile
    _write_ninja_file_and_build_library(
  File "/opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1623, in _write_ninja_file_and_build_library
    _run_ninja_build(
  File "/opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1916, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension 'segment_cumsum_cuda'
(base) root@user:/home/docker_block_nerf/Block_NeRF# 

That is RuntimeError: Error building extension 'segment_cumsum_cuda'. How can I resolve it ?

IaroslavS avatar Jul 01 '23 10:07 IaroslavS

Hi, I also had a similar issue; here is how I fixed it:

  1. Start with a clean Conda env (I guess Python env would also work, wouldn't hurt to try).
  2. First things first install all the CUDA Runtime API stuff you will need. Nvidia provides the links here; if you use a Python env you could try the pip version.). I suppose here one thing is important, designate the CUDA version your Torch uses; for me it was CUDA 11.7 therefore I used the following: conda install cuda -c nvidia/label/cuda-11.7.0.
  3. Install your compatible Torch, for me: pip install torch==2.0.0+cu117 torchvision==0.15.1+cu117 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu117
  4. Install torch_efficient_loss: pip install torch_efficient_distloss

Hope this helps.

Cheers,

mertkaraoglu avatar Sep 06 '23 19:09 mertkaraoglu

Hi, I also had a similar issue; here is how I fixed it:

  1. Start with a clean Conda env (I guess Python env would also work, wouldn't hurt to try).
  2. First things first install all the CUDA Runtime API stuff you will need. Nvidia provides the links here; if you use a Python env you could try the pip version.). I suppose here one thing is important, designate the CUDA version your Torch uses; for me it was CUDA 11.7 therefore I used the following: conda install cuda -c nvidia/label/cuda-11.7.0.
  3. Install your compatible Torch, for me: pip install torch==2.0.0+cu117 torchvision==0.15.1+cu117 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu117
  4. Install torch_efficient_loss: pip install torch_efficient_distloss

Hope this helps.

Cheers,

I still have this problem after following this instruction. Do you add cuda 11.7 installed using conda into the library path? or something else I can try?

daipengwa avatar Jan 05 '24 19:01 daipengwa