gaussian-splatting icon indicating copy to clipboard operation
gaussian-splatting copied to clipboard

Runtime Error: CUDA out of memory

Open KansaiUser opened this issue 6 months ago • 5 comments

I am trying to run train.py on my dataset of 1362 pictures of 1600x1300 pixels. When I do I get

 python /root/gaussian-splatting/train.py  -s /Work  --iterations 90000
Optimizing
Output folder: ./output/bb11fedf-2 [26/02 11:48:25]
Tensorboard not available: not logging progress [26/02 11:48:25]
Reading camera 1362/1362 [26/02 11:48:26]
Loading Training Cameras [26/02 11:48:26]
Traceback (most recent call last):
  File "/root/gaussian-splatting/train.py", line 219, in <module>
    training(lp.extract(args), op.extract(args), pp.extract(args), args.test_iterations, args.save_iterations, args.checkpoint_iterations, args.start_checkpoint, args.debug_from)
  File "/root/gaussian-splatting/train.py", line 35, in training
    scene = Scene(dataset, gaussians)
  File "/root/gaussian-splatting/scene/__init__.py", line 73, in __init__
    self.train_cameras[resolution_scale] = cameraList_from_camInfos(scene_info.train_cameras, resolution_scale, args)
  File "/root/gaussian-splatting/utils/camera_utils.py", line 58, in cameraList_from_camInfos
    camera_list.append(loadCam(args, id, c, resolution_scale))
  File "/root/gaussian-splatting/utils/camera_utils.py", line 52, in loadCam
    image_name=cam_info.image_name, uid=id, data_device=args.data_device)
  File "/root/gaussian-splatting/scene/cameras.py", line 46, in __init__
    self.original_image *= torch.ones((1, self.image_height, self.image_width), device=self.data_device)
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 23.68 GiB total capacity; 21.61 GiB already allocated; 7.56 MiB free; 22.57 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

So apparently the problem is in the line

 self.original_image *= torch.ones((1, self.image_height, self.image_width), device=self.data_device)

in line 46 of cameras.py

What could be a solution to this problem? I am not entirely sure if the number of images (1362) or the size of the images (1600x1300) could be causing the problem.

KansaiUser avatar Feb 26 '24 13:02 KansaiUser

Drop numbers to like 500 and you will be fine. You don't need so much data at your 'home' GPU to get good effect.

jaco001 avatar Feb 26 '24 15:02 jaco001

The code will put all images on GPU,so memory is not enough

YHJCTR avatar Apr 19 '24 08:04 YHJCTR

Drop numbers to like 500 and you will be fine. You don't need so much data at your 'home' GPU to get good effect.

hi,Can you explain in detail how to do it? Thank you very much

smart4654154 avatar Apr 25 '24 12:04 smart4654154

I think we can reduce the resolution of the photo 1.Perhaps we can reduce the resolution of the photos first and then use colmap for matching. 2. Use colmap matching and low-resolution photos for training by setting command line parameters

smart4654154 avatar Apr 25 '24 12:04 smart4654154

Adding everything isn't good solution (I'm assume that this case is entire movie from eg drone)
GS can make decent result even with 40-50 pic if you know what you do. If you can't put all 1362 photos (because you don't have a $2000 GPU with enough VRAM), just select every third one (out of laziness ^^). 500 photos at 1600x1300 are more bearable on a 'home' GPU. It would be ideal if you choose as many sharp photos as they are important in adding information about the scene and not 'blurring' it.

jaco001 avatar Apr 25 '24 14:04 jaco001