SegAnyGAussians RuntimeError: CUDA out of memory. Tried to allocate 11.04 GiB (GPU 0; 47.53 GiB total capacity; 30.83 GiB already allocated; 8.16 GiB free; 34.96 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size

作者你好，首先这是一个相当了不起的工作！在我运行train_contrastive_feature.py进行训练的时候，我遇到报错： RuntimeError: CUDA out of memory. Tried to allocate 11.04 GiB (GPU 0; 47.53 GiB total capacity; 30.83 GiB already allocated; 8.16 GiB free; 34.96 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. 尽管我使用A6000训练并且有47GB的显存，但是仍然显示out of memory？我只是一个新手，可以给我一些建议吗？非常感谢！

Jun 08 '24 10:06 shawnFuu

Hey, I have faced the same issue and this is a drawback of this approach. For all pixels in the image, you generate additional features that are then gonna be used for segmentation.

For me, there were two main ways to overcome this issue:

If you are using a scene with a lot of images, it may be possible to only use every 10th image from the input dataset to run your segmentation tasks.
If you have high-resolution images, you may want to set the --downsample parameter when extracting SAM masks. This requires that you have an image_<downsample> version of your dataset. If you use the convert.py script to preprocess your dataset, you can set the --resize flag which will automatically downsample images to half/one quarter/one eighth of the original resolution

Unless you have 4k images as an input, I would use a combination of downsampling images and resolution to not lose too much information.

Jun 08 '24 12:06 Luca-Wiehe

非常感谢，我会去尝试！ @Luca-Wiehe

Jun 08 '24 12:06 shawnFuu

Hey, I have faced the same issue and this is a drawback of this approach. For all pixels in the image, you generate additional features that are then gonna be used for segmentation.

For me, there were two main ways to overcome this issue:

If you are using a scene with a lot of images, it may be possible to only use every 10th image from the input dataset to run your segmentation tasks.

If you have high-resolution images, you may want to set the --downsample parameter when extracting SAM masks. This requires that you have an image_<downsample> version of your dataset. If you use the convert.py script to preprocess your dataset, you can set the --resize flag which will automatically downsample images to half/one quarter/one eighth of the original resolution

Unless you have 4k images as an input, I would use a combination of downsampling images and resolution to not lose too much information.

Hi. Im not sure what you mean about " it may be possible to only use every 10th image from the input dataset to run your segmentation tasks". Im running out of cuda while training.

Do you mean the batch size? I try to set num_sampled_scales = 4. Now its trainning and Im not sure whether it helps. If it works or not, i'll report it on this issue.

Could you explain this more clearly to help me modify the code or command? I'd really appreciate it!

Jun 08 '24 13:06 shawnFuu

Before python train_contrastive_feature.py, you extract image data using python extract_segment_everything_masks.py and python get_scale.py. You specify image locations using --image_root <path to images>. If you have a lot of images with minor differences inside of this image root (let's say 5000), only keep every 10th image (so that you have only 500 left).

Jun 08 '24 16:06 Luca-Wiehe

Before python train_contrastive_feature.py, you extract image data using python extract_segment_everything_masks.py and python get_scale.py. You specify image locations using --image_root <path to images>. If you have a lot of images with minor differences inside of this image root (let's say 5000), only keep every 10th image (so that you have only 500 left).

That's really helpful. Thank you for your explaination!

Jun 09 '24 03:06 shawnFuu

Thanks @Luca-Wiehe for the help!

Jun 10 '24 05:06 Jumpat

Try to load features and masks into CPU:

diff --git a/scene/dataset_readers.py b/scene/dataset_readers.py
--- a/scene/dataset_readers.py	(revision a32a2e9b7c5d6ddd14244c66fd7ed5bda52f3ec1)
+++ b/scene/dataset_readers.py	(date 1718096060241)
@@ -109,9 +109,9 @@
         image_name = os.path.basename(image_path).split(".")[0]
         image = Image.open(image_path)
 
-        features = torch.load(os.path.join(features_folder, image_name.split('.')[0] + ".pt")) if features_folder is not None else None
-        masks = torch.load(os.path.join(masks_folder, image_name.split('.')[0] + ".pt")) if masks_folder is not None else None
-        mask_scales = torch.load(os.path.join(mask_scale_folder, image_name.split('.')[0] + ".pt")) if mask_scale_folder is not None else None
+        features = torch.load(os.path.join(features_folder, image_name.split('.')[0] + ".pt"), map_location="cpu") if features_folder is not None else None
+        masks = torch.load(os.path.join(masks_folder, image_name.split('.')[0] + ".pt"), map_location="cpu") if masks_folder is not None else None
+        mask_scales = torch.load(os.path.join(mask_scale_folder, image_name.split('.')[0] + ".pt"), map_location="cpu") if mask_scale_folder is not None else None
 
         cam_info = CameraInfo(uid=uid, R=R, T=T, FovY=FovY, FovX=FovX, image=image, features=features, masks=masks, mask_scales = mask_scales,
                               image_path=image_path, image_name=image_name, width=width, height=height, cx=intr.params[2] if len(intr.params) > 3 and allow_principle_point_shift else None, cy=intr.params[3] if len(intr.params) >3 and allow_principle_point_shift else None)

Jun 11 '24 10:06 yzslab