BundleSDF icon indicating copy to clipboard operation
BundleSDF copied to clipboard

When run_custom.py, No CUDA GPUs are available

Open RyanbowZ opened this issue 1 year ago • 1 comments

I am running as descried steps, and when I execute everything in order and execute run_custom.py, it first warns me that version `GLIBCXX_3.4.30' not found, and then I resolved it as described in #15 using the following command:

export LD_LIBRARY_PATH=/opt/conda/lib:${LD_LIBRARY_PATH}

However, this time it does not give me the above warning, but with a new one:

RuntimeError: No CUDA GPUs are available

I also tried to output available GPUs in the docker, and it shows 0 GPUs are avilable.

I am currently on RTX4090, and I am wondering whether this is because the default CUDA 11.3 version does not support it.

RyanbowZ avatar May 02 '24 01:05 RyanbowZ

did you install nvidia docker container?

wenbowen123 avatar May 08 '24 18:05 wenbowen123

@wenbowen123 hi, I have the nvidia container toolkit installed, yet I experience the same problem on 3070Ti:

(py38) root@host:/BundleSDF# python run_custom.py --mode run_video --video_dir ./data/2022-11-18-15-10-24_milk --out_folder ./data/2022-11-18-15-10-24_milk_out --use_segmenter 1 --use_gui 1 --debug_level 2
[2024-09-17 13:03:41.688] [warning] [Bundler.cpp:49] Connected to nerf_port 9999
[2024-09-17 13:03:41.689] [warning] [FeatureManager.cpp:2084] Connected to port 5555
default_cfg {'backbone_type': 'ResNetFPN', 'resolution': (8, 2), 'fine_window_size': 5, 'fine_concat_coarse_feat': True, 'resnetfpn': {'initial_dim': 128, 'block_dims': [128, 196, 256]}, 'coarse': {'d_model': 256, 'd_ffn': 256, 'nhead': 8, 'layer_names': ['self', 'cross', 'self', 'cross', 'self', 'cross', 'self', 'cross'], 'attention': 'linear', 'temp_bug_fix': False}, 'match_coarse': {'thr': 0.2, 'border_rm': 2, 'match_type': 'dual_softmax', 'dsmax_temperature': 0.1, 'skh_iters': 3, 'skh_init_bin_score': 1.0, 'skh_prefilter': True, 'train_coarse_percent': 0.4, 'train_pad_num_gt_min': 200}, 'fine': {'d_model': 128, 'd_ffn': 128, 'nhead': 8, 'layer_names': ['self', 'cross'], 'attention': 'linear'}}
Traceback (most recent call last):
  File "run_custom.py", line 223, in <module>
    run_one_video(video_dir=args.video_dir, out_folder=args.out_folder, use_segmenter=args.use_segmenter, use_gui=args.use_gui)
  File "run_custom.py", line 68, in run_one_video
    tracker = BundleSdf(cfg_track_dir=cfg_track_dir, cfg_nerf_dir=cfg_nerf_dir, start_nerf_keyframes=5, use_gui=use_gui)
  File "/BundleSDF/bundlesdf.py", line 318, in __init__
    self.loftr = LoftrRunner()
  File "/BundleSDF/loftr_wrapper.py", line 25, in __init__
    self.matcher = self.matcher.eval().cuda()
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 688, in cuda
    return self._apply(lambda t: t.cuda(device))
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 578, in _apply
    module._apply(fn)
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 578, in _apply
    module._apply(fn)
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 601, in _apply
    param_applied = fn(param)
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 688, in <lambda>
    return self._apply(lambda t: t.cuda(device))
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/torch/cuda/__init__.py", line 216, in _lazy_init
    torch._C._cuda_init()
RuntimeError: No CUDA GPUs are available

A guess is that the GPU driver / CUDA version are incompatible with the 11.3 setup in the container:

Sat Sep 14 23:25:55 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01             Driver Version: 535.183.01   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3070 Ti    Off | 00000000:01:00.0  On |                  N/A |
| N/A   42C    P3              22W / 115W |    328MiB /  8192MiB |     21%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

I will try rebuilding the container from a more recent nvidia image.

kirilllzaitsev avatar Sep 17 '24 20:09 kirilllzaitsev