Memory Issue on train_constrastive_feature.py
Hi,
Im encountering the following memory issue when trying to run train constrastive feature.
(saga) C:\Users\caspe\SegAnyGAussians>python train_contrastive_feature.py -m output/50224008-1 --iterations 10000 --num_sampled_rays 1000
Looking for config file in output/50224008-1\cfg_args
Config file found: output/50224008-1\cfg_args
Optimizing output/50224008-1
RFN weight: 1.0 [04/06 19:27:43]
Smooth K: 16 [04/06 19:27:43]
Scale aware dim: -1 [04/06 19:27:43]
Loading trained model at iteration 30000, None [04/06 19:27:43]
Allow Camera Principle Point Shift: False [04/06 19:27:43]
Reading camera 22/22 [04/06 19:27:46]
Loading Training Cameras [04/06 19:27:46]
Loading Test Cameras [04/06 19:27:48]
Training progress: 0%| | 0/10000 [00:00<?, ?it/s]Preparing Quantile Transform... [04/06 19:27:49]
Using adaptive scale gate. [04/06 19:27:49]
Traceback (most recent call last):
File "C:\Users\caspe\SegAnyGAussians\train_contrastive_feature.py", line 369, in <module>
training(lp.extract(args), op.extract(args), pp.extract(args), args.iteration, args.save_iterations, args.checkpoint_iterations, args.debug_from)
File "C:\Users\caspe\SegAnyGAussians\train_contrastive_feature.py", line 247, in training
feature_with_scale = rendered_features.unsqueeze(0).repeat([sampled_scales.shape[0],1,1,1])
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 8.81 GiB (GPU 0; 10.00 GiB total capacity; 15.50 GiB already allocated; 0 bytes free; 15.63 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Training progress: 0%| | 0/10000 [00:17<?, ?it/s]
I've tried lowering num_sampled_rays and reducing max_split_size_mb. I'm on an RTX3080 with 10GB of Vram.
Hi, I think 10GB might not be sufficient. You could try using a lower resolution for training and sampling fewer scales. That said, I still doubt whether 10GB will be enough even with these adjustments.
I encountered the same issue. I run the code on an RTX A5000 24 GB GPU, which, according to the paper, should be equivalent and enough to run all experiments.
python train_contrastive_feature.py -m output/mipnerf360/bicycle --iterations 10000 --num_sampled_rays 100
Looking for config file in output/mipnerf360/bicycle/cfg_args
Config file found: output/mipnerf360/bicycle/cfg_args
Optimizing output/mipnerf360/bicycle
warnings.warn(
RFN weight: 1.0 [23/06 00:55:39]
Smooth K: 16 [23/06 00:55:39]
Scale aware dim: -1 [23/06 00:55:39]
Loading trained model at iteration 30000, None [23/06 00:55:39]
Allow Camera Principle Point Shift: False [23/06 00:55:39]
Reading camera 194/194 [23/06 00:55:44]
Loading Training Cameras [23/06 00:55:44]
Loading Test Cameras [23/06 00:56:48]
Training progress: 0%| | 0/10000 [00:00<?, ?it/s]Preparing Quantile Transform... [23/06 00:56:57]
Using adaptive scale gate. [23/06 00:56:57]
Traceback (most recent call last):
File "/data_fast/nhatth/code/projects/SegAnyGAussians/train_contrastive_feature.py", line 369, in <module>
training(lp.extract(args), op.extract(args), pp.extract(args), args.iteration, args.save_iterations, args.checkpoint_iterations, args.debug_from)
File "/data_fast/nhatth/code/projects/SegAnyGAussians/train_contrastive_feature.py", line 301, in training
loss.backward()
File "/data_fast/nhatth/code/libs/python/lib/python3.12/site-packages/torch/_tensor.py", line 648, in backward
torch.autograd.backward(
File "/data_fast/nhatth/code/libs/python/lib/python3.12/site-packages/torch/autograd/__init__.py", line 353, in backward
_engine_run_backward(
File "/data_fast/nhatth/code/libs/python/lib/python3.12/site-packages/torch/autograd/graph.py", line 824, in _engine_run_backward
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.21 GiB. GPU 0 has a total capacity of 23.67 GiB of which 814.06 MiB is free. Including non-PyTorch memory, this process has 22.86 GiB memory in use. Of the allocated memory 21.67 GiB is allocated by PyTorch, and 934.76 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
Training progress: 0%| | 0/10000 [01:38<?, ?it/s]
I found a similar problem https://github.com/Jumpat/SegAnyGAussians/issues/119. Even if I use --downsample=8 when extracting SAM masks, I still get an out-of-memory error after a few training iterations.
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 5.42 GiB. GPU 0 has a total capacity of 23.67 GiB of which 928.06 MiB is free. Including non-PyTorch memory, this process has 22.75 GiB memory in use. Of the allocated memory 12.59 GiB is allocated by PyTorch, and 9.87 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
Training progress: 2%| | 160/10000 [02:39<2:43:13, 1.00it/s, RFN=0.842, Pos cos=0.396, Neg cos=0.129, Loss=-0.136]
Hi, I have the same problem on the RTX 4090, which, based on the paper, should not be any problem. I have tried downsample 8 , num_sampled_rays and reducing max_split_size_mb but they did not help
has anyone found the solution?
Same issue here, on RTX4090 and downsample 8 num_sampled_rays reduce to 32. I am using 360_v2 garder dataset.
Hi,
Im encountering the following memory issue when trying to run train constrastive feature.
(saga) C:\Users\caspe\SegAnyGAussians>python train_contrastive_feature.py -m output/50224008-1 --iterations 10000 --num_sampled_rays 1000 Looking for config file in output/50224008-1\cfg_args Config file found: output/50224008-1\cfg_args Optimizing output/50224008-1 RFN weight: 1.0 [04/06 19:27:43] Smooth K: 16 [04/06 19:27:43] Scale aware dim: -1 [04/06 19:27:43] Loading trained model at iteration 30000, None [04/06 19:27:43] Allow Camera Principle Point Shift: False [04/06 19:27:43] Reading camera 22/22 [04/06 19:27:46] Loading Training Cameras [04/06 19:27:46] Loading Test Cameras [04/06 19:27:48] Training progress: 0%| | 0/10000 [00:00<?, ?it/s]Preparing Quantile Transform... [04/06 19:27:49] Using adaptive scale gate. [04/06 19:27:49] Traceback (most recent call last): File "C:\Users\caspe\SegAnyGAussians\train_contrastive_feature.py", line 369, in <module> training(lp.extract(args), op.extract(args), pp.extract(args), args.iteration, args.save_iterations, args.checkpoint_iterations, args.debug_from) File "C:\Users\caspe\SegAnyGAussians\train_contrastive_feature.py", line 247, in training feature_with_scale = rendered_features.unsqueeze(0).repeat([sampled_scales.shape[0],1,1,1]) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 8.81 GiB (GPU 0; 10.00 GiB total capacity; 15.50 GiB already allocated; 0 bytes free; 15.63 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF Training progress: 0%| | 0/10000 [00:17<?, ?it/s]I've tried lowering num_sampled_rays and reducing max_split_size_mb. I'm on an RTX3080 with 10GB of Vram.
hello,can you solve it?
I encountered the same problem, and I found out it was because the Gaussian splatting was too heavy and dense. So I changed the values for densification, pruning, and even opacity to reduce the number of generated splats:
gaussians.densify_and_prune(opt.densify_grad_threshold, 0.1, scene.cameras_extent, size_threshold)
I also kept max_sh_degree set to 0 to keep only the RGB values, which of course results in lower quality.