Vox-Fusion icon indicating copy to clipboard operation
Vox-Fusion copied to clipboard

CUDA out of memory.

Open Xiaxia1997 opened this issue 2 years ago • 5 comments

I am trying to run scannet/scene0059, but got cuda out of memory error. Here is the error message:

home/slam/.conda/envs/ngp_pl/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/slam/.conda/envs/ngp_pl/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/Vox-Fusion/src/tracking.py", line 97, in spin
    self.do_tracking(share_data, current_frame, kf_buffer)
  File "/Vox-Fusion/src/tracking.py", line 128, in do_tracking
    frame_pose, hit_mask = track_frame(
  File "/Vox-Fusion/src/variations/render_helpers.py", line 450, in track_frame
    final_outputs = render_rays(
  File "/Vox-Fusion/src/variations/render_helpers.py", line 223, in render_rays
    samples = ray_sample(intersections, step_size=step_size)
  File "/home/slam/.conda/envs/ngp_pl/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/Vox-Fusion/src/variations/voxel_helpers.py", line 575, in ray_sample
    sampled_idx, sampled_depth, sampled_dists = inverse_cdf_sampling(
  File "/Vox-Fusion/src/variations/voxel_helpers.py", line 292, in forward
    noise = min_depth.new_zeros(*min_depth.size()[:-1], max_steps)
RuntimeError: CUDA out of memory. Tried to allocate 745.06 GiB (GPU 0; 23.70 GiB total capacity; 146.05 MiB already allocated; 11.93 GiB free; 176.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
[W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
^CTraceback (most recent call last):
  File "demo/run.py", line 23, in <module>
    slam.wait_child_processes()
  File "/Vox-Fusion/src/voxslam.py", line 62, in wait_child_processes
    p.join()
  File "/home/slam/.conda/envs/ngp_pl/lib/python3.8/multiprocessing/process.py", line 149, in join
    res = self._popen.wait(timeout)
  File "/home/slam/.conda/envs/ngp_pl/lib/python3.8/multiprocessing/popen_fork.py", line 47, in wait
    return self.poll(os.WNOHANG if timeout == 0.0 else 0)
  File "/home/slam/.conda/envs/ngp_pl/lib/python3.8/multiprocessing/popen_fork.py", line 27, in poll
    pid, sts = os.waitpid(self.pid, flag)
KeyboardInterrupt
Process Process-2:
Traceback (most recent call last):
  File "/home/slam/.conda/envs/ngp_pl/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/slam/.conda/envs/ngp_pl/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/Vox-Fusion/src/mapping.py", line 89, in spin
    if not kf_buffer.empty():
  File "/home/slam/.conda/envs/ngp_pl/lib/python3.8/multiprocessing/queues.py", line 123, in empty
    return not self._poll()
  File "/home/slam/.conda/envs/ngp_pl/lib/python3.8/multiprocessing/connection.py", line 257, in poll
    return self._poll(timeout)
  File "/home/slam/.conda/envs/ngp_pl/lib/python3.8/multiprocessing/connection.py", line 424, in _poll
    r = wait([self], timeout)
  File "/home/slam/.conda/envs/ngp_pl/lib/python3.8/multiprocessing/connection.py", line 925, in wait
    selector.register(obj, selectors.EVENT_READ)
  File "/home/slam/.conda/envs/ngp_pl/lib/python3.8/selectors.py", line 352, in register
    key = super().register(fileobj, events, data)
  File "/home/slam/.conda/envs/ngp_pl/lib/python3.8/selectors.py", line 235, in register
    if (not events) or (events & ~(EVENT_READ | EVENT_WRITE)):
KeyboardInterrupt
/home/slam/.conda/envs/ngp_pl/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 3 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

Xiaxia1997 avatar Feb 15 '23 10:02 Xiaxia1997

I got the same problem! Hope for the reply.

JunyuanDeng avatar Feb 18 '23 05:02 JunyuanDeng

This problem might need to be diagnosed with more intermediate results. How are the predicted color and depth maps? (generated with the render_freq option)

xingruiyang avatar Feb 18 '23 06:02 xingruiyang

Now I dont' have the predicted color and depth maps, hope @Xiaxia1997 can provide more information.

I can provide my found: I just print *min_depth.size()[:-1], max_steps and I found max_steps is huge, like 8*1e8. I check the source code, it might be the problem of max_distance and min_distance here.

JunyuanDeng avatar Feb 18 '23 06:02 JunyuanDeng

I encounter this error each time there is a loop during the tracking. It seems the ray intersects with very far voxels, causing the max distance to be very big.

JunyuanDeng avatar Feb 20 '23 14:02 JunyuanDeng

I wonder how this problem can be solved. I find the max_depth in the config, maybe the voxel that exceeds the max_depth value should be ignored?

jarvishou829 avatar Oct 31 '23 01:10 jarvishou829