run_global_nerf not finishing due to ValueError: Invalid device ID (0)
Hi,
The tracking part of the pipeline has successfully produced all the expected files, while the Nerf postprocessing in run_global_nerf fails with the following:
(py38) root@node:/BundleSDF# python run_custom.py --mode run_video --video_dir /data/custom_sim_ikea/env_10 --out_folder ./data/env_10_out --use_segmenter 0 --use_gui 0 --debug_level 2 --do_only_global_nerf
[2025-01-05 05:01:20.334] [warning] [Bundler.cpp:49] Connected to nerf_port 9999
[2025-01-05 05:01:20.334] [warning] [FeatureManager.cpp:2084] Connected to port 5555
default_cfg {'backbone_type': 'ResNetFPN', 'resolution': (8, 2), 'fine_window_size': 5, 'fine_concat_coarse_feat': True, 'resnetfpn': {'initial_dim': 128, 'block_dims': [128, 196, 256]}, 'coarse': {'d_model': 256, 'd_ffn': 256, 'nhead': 8, 'layer_names': ['self', 'cross', 'self', 'cross', 'self', 'cross', 'self', 'cross'], 'attention': 'linear', 'temp_bug_fix': False}, 'match_coarse': {'thr': 0.2, 'border_rm': 2, 'match_type': 'dual_softmax', 'dsmax_temperature': 0.1, 'skh_iters': 3, 'skh_init_bin_score': 1.0, 'skh_prefilter': True, 'train_coarse_percent': 0.4, 'train_pad_num_gt_min': 200}, 'fine': {'d_model': 128, 'd_ffn': 128, 'nhead': 8, 'layer_names': ['self', 'cross'], 'attention': 'linear'}}
[bundlesdf.py] last_stamp 000150
[bundlesdf.py] keyframes#: 23
[tool.py] compute_scene_bounds_worker start
[tool.py] compute_scene_bounds_worker done
[tool.py] merge pcd
[tool.py] compute_translation_scales done
translation_cvcam=[ 0.00084036 -0.0012077 -0.01543765], sc_factor=2.4707552927162815
[nerf_runner.py] Octree voxel dilate_radius:1
level 0, resolution: 16
level 1, resolution: 20
level 2, resolution: 24
level 3, resolution: 28
level 4, resolution: 34
level 5, resolution: 41
level 6, resolution: 49
level 7, resolution: 59
level 8, resolution: 71
level 9, resolution: 85
level 10, resolution: 102
level 11, resolution: 123
level 12, resolution: 148
level 13, resolution: 177
level 14, resolution: 213
level 15, resolution: 256
GridEncoder: input_dim=3 n_levels=16 level_dim=2 resolution=16 -> 256 per_level_scale=1.2030 params=(20411696, 2) gridtype=hash align_corners=False
sc_factor 2.4707552927162815
translation [ 0.00084036 -0.0012077 -0.01543765]
[nerf_runner.py] denoise cloud
[nerf_runner.py] Denoising rays based on octree cloud
[nerf_runner.py] bad_mask#=1
rays torch.Size([635102, 12])
Start training
[nerf_runner.py] train progress 0/2001
[nerf_runner.py] Iter: 0, valid_samples: 655360/655360, valid_rays: 2048/2048, loss: 19.1780186, rgb_loss: 18.7252693, rgb0_loss: 0.0000000, fs_rgb_loss: 0.0000000, depth_loss: 0.0000000, depth_loss0: 0.0000000, fs_loss: 0.0590795, point_cloud_loss: 0.0000000, point_cloud_normal_loss: 0.0000000, sdf_loss: 0.2663427, eikonal_loss: 0.0000000, variation_loss: 0.0000000, truncation(meter): 0.0100000, pose_reg: 0.0000000, reg_features: 0.1273278,
[nerf_runner.py] train progress 200/2001
[nerf_runner.py] train progress 400/2001
[nerf_runner.py] train progress 600/2001
[nerf_runner.py] train progress 800/2001
[nerf_runner.py] train progress 1000/2001
[nerf_runner.py] train progress 1200/2001
[nerf_runner.py] train progress 1400/2001
[nerf_runner.py] train progress 1600/2001
[nerf_runner.py] train progress 1800/2001
[nerf_runner.py] train progress 2000/2001
cp: cannot stat './data/env_10_out/nerf_with_bundletrack_online/image_step_*.png': No such file or directory
[nerf_runner.py] query_pts:torch.Size([66430125, 3]), valid:4329017
[nerf_runner.py] Running Marching Cubes
[nerf_runner.py] done V:(210427, 3), F:(420902, 3)
[acceleratesupport.py] OpenGL_accelerate module loaded
[arraydatatype.py] Using accelerated ArrayDatatype
Traceback (most recent call last):
File "run_custom.py", line 256, in <module>
run_one_video_global_nerf(out_folder=args.out_folder)
File "run_custom.py", line 178, in run_one_video_global_nerf
tracker.run_global_nerf(reader=reader, get_texture=True, tex_res=512)
File "/BundleSDF/bundlesdf.py", line 765, in run_global_nerf
mesh = nerf.mesh_texture_from_train_images(mesh, rgbs_raw=rgbs_raw, train_texture=False, tex_res=tex_res)
File "/BundleSDF/nerf_runner.py", line 1491, in mesh_texture_from_train_images
renderer = ModelRendererOffscreen([], cam_K=self.K, H=self.H, W=self.W, zfar=self.cfg['far']*self.cfg['sc_factor'])
File "/BundleSDF/offscreen_renderer.py", line 59, in __init__
self.r = pyrender.OffscreenRenderer(self.W, self.H) #!NOTE version>0.1.32 not work https://github.com/mmatl/pyrender/issues/85
File "/opt/conda/envs/py38/lib/python3.8/site-packages/pyrender/offscreen.py", line 31, in __init__
self._create()
File "/opt/conda/envs/py38/lib/python3.8/site-packages/pyrender/offscreen.py", line 137, in _create
egl_device = egl.get_device_by_index(device_id)
File "/opt/conda/envs/py38/lib/python3.8/site-packages/pyrender/platforms/egl.py", line 83, in get_device_by_index
raise ValueError('Invalid device ID ({})'.format(device_id, len(devices)))
ValueError: Invalid device ID (0)
The problem has been faced by a few people before:
- https://github.com/NVlabs/BundleSDF/issues/73
- https://github.com/NVlabs/BundleSDF/issues/15
However, the solutions suggested:
pip uninstall scipy && pip install scipy==1.9
conda install -c conda-forge gcc=12.1.0
cp /opt/conda/envs/py38/lib/libstdc++.so.6.0.33 /usr/lib/x86_64-linux-gnu/
rm /usr/lib/x86_64-linux-gnu/libstdc++.so.6
ln -s /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.33 /usr/lib/x86_64-linux-gnu/libstdc++.so.6
are not working for me, having an Ubuntu 22.04 with one CUDA device. Could you provide some guidance on this issue?
I also met this problem. Did you get around it?
Hello, how could I modify the setup to make it work with Ubuntu 22.04, NVIDIA-SMI 535.183.01, Driver Version: 535.183.01 CUDA Version: 12.2 @kirilllzaitsev @bowieshi ? I imagine I have to modify the dockerfile, but it's giving me problems, any advice?