neural_graph_mapping icon indicating copy to clipboard operation
neural_graph_mapping copied to clipboard

Reduce VRAM usage

Open YerldSHO opened this issue 8 months ago • 2 comments

✨ Pixi task (nrgbd_wr in default): python -m neural_graph_mapping.run_mapping --config nrgbd_dataset.yaml neural_graph_map.yaml coslam_eval.yaml --dataset_config.root_dir $NGM_DATA_DIR/nrgbd/ --dataset_config.scene whiteroom $NGM_EXTRA_ARGS --rerun_vis True /home/alex/projects/neural_graph_mapping/.pixi/envs/default/lib/python3.10/site-packages/torch/utils/data/dataloader.py:558: UserWarning: This DataLoader will create 32 worker processes in total. Our suggested max number of worker in current system is 12, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary. warnings.warn(_create_warning_msg( Traceback (most recent call last): File "/home/alex/projects/neural_graph_mapping/.pixi/envs/default/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/alex/projects/neural_graph_mapping/.pixi/envs/default/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/home/alex/projects/neural_graph_mapping/src/neural_graph_mapping/run_mapping.py", line 2428, in main() File "/home/alex/projects/neural_graph_mapping/src/neural_graph_mapping/run_mapping.py", line 2421, in main neural_graph_map.fit() File "/home/alex/projects/neural_graph_mapping/src/neural_graph_mapping/run_mapping.py", line 1032, in fit self._init_mv_training_data() File "/home/alex/projects/neural_graph_mapping/src/neural_graph_mapping/utils.py", line 83, in wrapper result = f(*args, **kwargs) File "/home/alex/projects/neural_graph_mapping/src/neural_graph_mapping/run_mapping.py", line 1692, in _init_mv_training_data self._nc_rgbd_tensor = torch.empty( torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.58 GiB. GPU 0 has a total capacity of 5.78 GiB of which 66.44 MiB is free. Process 11099 has 5.30 GiB memory in use. Including non-PyTorch memory, this process has 116.00 MiB memory in use. Of the allocated memory 9.94 MiB is allocated by PyTorch, and 12.06 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) Exception in thread Thread-1 (_pin_memory_loop): Traceback (most recent call last): File "/home/alex/projects/neural_graph_mapping/.pixi/envs/default/lib/python3.10/threading.py", line 1016, in _bootstrap_inner self.run() File "/home/alex/projects/neural_graph_mapping/.pixi/envs/default/lib/python3.10/threading.py", line 953, in run self._target(*self._args, **self._kwargs) File "/home/alex/projects/neural_graph_mapping/.pixi/envs/default/lib/python3.10/site-packages/torch/utils/data/_utils/pin_memory.py", line 53, in _pin_memory_loop do_one_step() File "/home/alex/projects/neural_graph_mapping/.pixi/envs/default/lib/python3.10/site-packages/torch/utils/data/_utils/pin_memory.py", line 30, in do_one_step r = in_queue.get(timeout=MP_STATUS_CHECK_INTERVAL) File "/home/alex/projects/neural_graph_mapping/.pixi/envs/default/lib/python3.10/multiprocessing/queues.py", line 122, in get return _ForkingPickler.loads(res) File "/home/alex/projects/neural_graph_mapping/.pixi/envs/default/lib/python3.10/site-packages/torch/multiprocessing/reductions.py", line 495, in rebuild_storage_fd fd = df.detach() File "/home/alex/projects/neural_graph_mapping/.pixi/envs/default/lib/python3.10/multiprocessing/resource_sharer.py", line 57, in detach with _resource_sharer.get_connection(self._id) as conn: File "/home/alex/projects/neural_graph_mapping/.pixi/envs/default/lib/python3.10/multiprocessing/resource_sharer.py", line 86, in get_connection c = Client(address, authkey=process.current_process().authkey) File "/home/alex/projects/neural_graph_mapping/.pixi/envs/default/lib/python3.10/multiprocessing/connection.py", line 508, in Client answer_challenge(c, authkey) File "/home/alex/projects/neural_graph_mapping/.pixi/envs/default/lib/python3.10/multiprocessing/connection.py", line 752, in answer_challenge message = connection.recv_bytes(256) # reject large message File "/home/alex/projects/neural_graph_mapping/.pixi/envs/default/lib/python3.10/multiprocessing/connection.py", line 216, in recv_bytes buf = self._recv_bytes(maxlength) File "/home/alex/projects/neural_graph_mapping/.pixi/envs/default/lib/python3.10/multiprocessing/connection.py", line 414, in _recv_bytes buf = self._recv(4) File "/home/alex/projects/neural_graph_mapping/.pixi/envs/default/lib/python3.10/multiprocessing/connection.py", line 379, in _recv chunk = read(handle, remaining) ConnectionResetError: [Errno 104] Connection reset by peer

Good afternoon, I was running your code and came across this problem, what could be the problem and how can I solve it?

YerldSHO avatar Jun 10 '24 01:06 YerldSHO