SegAnyGAussians Memory Issue on train_constrastive

Hi,

Im encountering the following memory issue when trying to run train constrastive feature.

(saga) C:\Users\caspe\SegAnyGAussians>python train_contrastive_feature.py -m output/50224008-1 --iterations 10000 --num_sampled_rays 1000
Looking for config file in output/50224008-1\cfg_args
Config file found: output/50224008-1\cfg_args
Optimizing output/50224008-1
RFN weight: 1.0 [04/06 19:27:43]
Smooth K: 16 [04/06 19:27:43]
Scale aware dim: -1 [04/06 19:27:43]
Loading trained model at iteration 30000, None [04/06 19:27:43]
Allow Camera Principle Point Shift: False [04/06 19:27:43]
Reading camera 22/22 [04/06 19:27:46]
Loading Training Cameras [04/06 19:27:46]
Loading Test Cameras [04/06 19:27:48]
Training progress:   0%|                                                                     | 0/10000 [00:00<?, ?it/s]Preparing Quantile Transform... [04/06 19:27:49]
Using adaptive scale gate. [04/06 19:27:49]
Traceback (most recent call last):
  File "C:\Users\caspe\SegAnyGAussians\train_contrastive_feature.py", line 369, in <module>
    training(lp.extract(args), op.extract(args), pp.extract(args), args.iteration, args.save_iterations, args.checkpoint_iterations, args.debug_from)
  File "C:\Users\caspe\SegAnyGAussians\train_contrastive_feature.py", line 247, in training
    feature_with_scale = rendered_features.unsqueeze(0).repeat([sampled_scales.shape[0],1,1,1])
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 8.81 GiB (GPU 0; 10.00 GiB total capacity; 15.50 GiB already allocated; 0 bytes free; 15.63 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Training progress:   0%|                                                                     | 0/10000 [00:17<?, ?it/s]

I've tried lowering num_sampled_rays and reducing max_split_size_mb. I'm on an RTX3080 with 10GB of Vram.

Jun 04 '25 17:06 CabybaraCoder

Hi, I think 10GB might not be sufficient. You could try using a lower resolution for training and sampling fewer scales. That said, I still doubt whether 10GB will be enough even with these adjustments.

Jun 12 '25 07:06 Jumpat

I encountered the same issue. I run the code on an RTX A5000 24 GB GPU, which, according to the paper, should be equivalent and enough to run all experiments.

python train_contrastive_feature.py -m output/mipnerf360/bicycle --iterations 10000 --num_sampled_rays 100
Looking for config file in output/mipnerf360/bicycle/cfg_args
Config file found: output/mipnerf360/bicycle/cfg_args
Optimizing output/mipnerf360/bicycle

  warnings.warn(
RFN weight: 1.0 [23/06 00:55:39]
Smooth K: 16 [23/06 00:55:39]
Scale aware dim: -1 [23/06 00:55:39]
Loading trained model at iteration 30000, None [23/06 00:55:39]
Allow Camera Principle Point Shift: False [23/06 00:55:39]
Reading camera 194/194 [23/06 00:55:44]
Loading Training Cameras [23/06 00:55:44]
Loading Test Cameras [23/06 00:56:48]
Training progress:   0%|                                                                      | 0/10000 [00:00<?, ?it/s]Preparing Quantile Transform... [23/06 00:56:57]
Using adaptive scale gate. [23/06 00:56:57]
Traceback (most recent call last):
  File "/data_fast/nhatth/code/projects/SegAnyGAussians/train_contrastive_feature.py", line 369, in <module>
    training(lp.extract(args), op.extract(args), pp.extract(args), args.iteration, args.save_iterations, args.checkpoint_iterations, args.debug_from)
  File "/data_fast/nhatth/code/projects/SegAnyGAussians/train_contrastive_feature.py", line 301, in training
    loss.backward()
  File "/data_fast/nhatth/code/libs/python/lib/python3.12/site-packages/torch/_tensor.py", line 648, in backward
    torch.autograd.backward(
  File "/data_fast/nhatth/code/libs/python/lib/python3.12/site-packages/torch/autograd/__init__.py", line 353, in backward
    _engine_run_backward(
  File "/data_fast/nhatth/code/libs/python/lib/python3.12/site-packages/torch/autograd/graph.py", line 824, in _engine_run_backward
    return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.21 GiB. GPU 0 has a total capacity of 23.67 GiB of which 814.06 MiB is free. Including non-PyTorch memory, this process has 22.86 GiB memory in use. Of the allocated memory 21.67 GiB is allocated by PyTorch, and 934.76 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
Training progress:   0%|                                                                      | 0/10000 [01:38<?, ?it/s]

Jun 22 '25 23:06 inspiros

I found a similar problem https://github.com/Jumpat/SegAnyGAussians/issues/119. Even if I use --downsample=8 when extracting SAM masks, I still get an out-of-memory error after a few training iterations.

torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 5.42 GiB. GPU 0 has a total capacity of 23.67 GiB of which 928.06 MiB is free. Including non-PyTorch memory, this process has 22.75 GiB memory in use. Of the allocated memory 12.59 GiB is allocated by PyTorch, and 9.87 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
Training progress:   2%|    | 160/10000 [02:39<2:43:13,  1.00it/s, RFN=0.842, Pos cos=0.396, Neg cos=0.129, Loss=-0.136]

Jun 23 '25 12:06 inspiros

Hi, I have the same problem on the RTX 4090, which, based on the paper, should not be any problem. I have tried downsample 8 , num_sampled_rays and reducing max_split_size_mb but they did not help

has anyone found the solution?

Jun 30 '25 17:06 amiretefaghi

Same issue here, on RTX4090 and downsample 8 num_sampled_rays reduce to 32. I am using 360_v2 garder dataset.

Jul 13 '25 02:07 tedlin0913

Hi,

Im encountering the following memory issue when trying to run train constrastive feature.

(saga) C:\Users\caspe\SegAnyGAussians>python train_contrastive_feature.py -m output/50224008-1 --iterations 10000 --num_sampled_rays 1000
Looking for config file in output/50224008-1\cfg_args
Config file found: output/50224008-1\cfg_args
Optimizing output/50224008-1
RFN weight: 1.0 [04/06 19:27:43]
Smooth K: 16 [04/06 19:27:43]
Scale aware dim: -1 [04/06 19:27:43]
Loading trained model at iteration 30000, None [04/06 19:27:43]
Allow Camera Principle Point Shift: False [04/06 19:27:43]
Reading camera 22/22 [04/06 19:27:46]
Loading Training Cameras [04/06 19:27:46]
Loading Test Cameras [04/06 19:27:48]
Training progress:   0%|                                                                     | 0/10000 [00:00<?, ?it/s]Preparing Quantile Transform... [04/06 19:27:49]
Using adaptive scale gate. [04/06 19:27:49]
Traceback (most recent call last):
  File "C:\Users\caspe\SegAnyGAussians\train_contrastive_feature.py", line 369, in <module>
    training(lp.extract(args), op.extract(args), pp.extract(args), args.iteration, args.save_iterations, args.checkpoint_iterations, args.debug_from)
  File "C:\Users\caspe\SegAnyGAussians\train_contrastive_feature.py", line 247, in training
    feature_with_scale = rendered_features.unsqueeze(0).repeat([sampled_scales.shape[0],1,1,1])
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 8.81 GiB (GPU 0; 10.00 GiB total capacity; 15.50 GiB already allocated; 0 bytes free; 15.63 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Training progress:   0%|                                                                     | 0/10000 [00:17<?, ?it/s]

I've tried lowering num_sampled_rays and reducing max_split_size_mb. I'm on an RTX3080 with 10GB of Vram.

hello，can you solve it？

Sep 05 '25 11:09 jianzhui

I encountered the same problem, and I found out it was because the Gaussian splatting was too heavy and dense. So I changed the values for densification, pruning, and even opacity to reduce the number of generated splats: gaussians.densify_and_prune(opt.densify_grad_threshold, 0.1, scene.cameras_extent, size_threshold)

I also kept max_sh_degree set to 0 to keep only the RGB values, which of course results in lower quality.

Oct 07 '25 09:10 Bourennaneyounes

Memory Issue on train_constrastive_feature.py