gaussian-splatting
gaussian-splatting copied to clipboard
Why do all images have to be on GPU?
Why is it necessary to keep images on the GPU? I have recently made a modification to load images onto the GPU only during training, and I can train with 8K images using only about 11GB of GPU memory with simple modification.
In research papers, it is necessary to do so in order to reduce training time, but for practical use, doing it this way reduces VRAM usage.
Are there any other reasons for keeping images on the GPU?
I would be interested in using your method, will you please include changes and what lines you made changes too?
Please see the latest commit below.
I'm using a 16GB GPU, and while it was running out of memory with 8K images, it seems to be working with 5K images. However the training takes longer than the original. https://github.com/graphdeco-inria/gaussian-splatting/commit/2cb880a8980ff69c1e5dc0ab9c4f8d0cd75aa0a7
I was also seeing this. Did you find out is it really neccessary to store RGB images on GPU? As I saw in some other projects that are using this work, they are storing RGB images on GPU but depth on CPU, so I just want to confirm is anything going to lead to wrong results or similar if I don't store them on GPU?
Just sharing some thoughts about this issue. There is an input argument (--data_device cpu) that you can use CPU thus storing the images using RAM. However, even using a CPU, readColmapCameras function will load all images at once to the RAM using Image.open and store images in CameraInfo. I feel it inconvenient when I running large number of images, for example 5k images.
But during the training there is only one image that is loaded on the GPU not all the images ? If I am not mistaken ?
My understanding is that all images are loaded into VRAM (or RAM if using --data_device cpu) at once by readColmapCameras, but during the actual training loop we randomly sample the training image ONE at a time during each iteration to perform the gradient updates to the gaussians.
It just makes it slower if you don't load everything to VRAM/RAM. No impact on fidelity.
To change this in 3DGS you need to make these changes in scene/dataset_readers.py
:
- Comment out
image: np.array
in CameraInfo's definition
class CameraInfo(NamedTuple):
uid: int
R: np.array
T: np.array
FovY: np.array
FovX: np.array
# image: np.array
image_path: str
image_name: str
width: int
height: int
- Remove the image loading part from
readCamerasFromTransforms
and hard-code/pre-process your data so you can still pass in image width & height
def readCamerasFromTransforms(path, transformsfile, white_background, extension=".png"):
cam_infos = []
with open(os.path.join(path, transformsfile)) as json_file:
contents = json.load(json_file)
fovx = contents["angle_x"]
frames = contents["frames"]
for idx, frame in enumerate(frames):
zfilled_idx = str(idx).zfill(6)
cam_name = os.path.join(path, "frames") + f"/frame_{zfilled_idx}{extension}"
# cam_name = str(idx)
# NeRF 'transform_matrix' is a camera-to-world transform
c2w = np.array(frame["transform_matrix"])
# change from OpenGL/Blender camera axes (Y up, Z back) to COLMAP (Y down, Z forward)
c2w[:3, 1:3] *= -1
# get the world-to-camera transform and set R, T
w2c = np.linalg.inv(c2w)
R = np.transpose(w2c[:3,:3]) # R is stored transposed due to 'glm' in CUDA code
T = w2c[:3, 3]
image_path = os.path.join(path, cam_name)
image_name = Path(cam_name).stem
width = 2000
height = 2000
bg = np.array([1,1,1]) if white_background else np.array([0, 0, 0])
fovy = focal2fov(fov2focal(fovx, width), height)
FovY = fovy
FovX = fovx
# TODO: Don't send img here... Open it when taking the loss/rendering.
cam_infos.append(CameraInfo(uid=idx, R=R, T=T, FovY=FovY, FovX=FovX,
image_path=image_path, image_name=image_name, width=width, height=height))
return cam_infos
- In
train.py
undertraining.py
, changegt_image = viewpoint_cam.original_image.cuda()
to
norm_data = np.array(Image.open(os.path.join(viewpoint_cam.image_path, viewpoint_cam.image_name)).convert("RGBA")) / 255.
bg = np.array([1,1,1]) if dataset.white_background else np.array([0, 0, 0])
arr = norm_data[:,:,:3] * norm_data[:, :, 3:4] + bg * (1 - norm_data[:, :, 3:4])
image = Image.fromarray(np.array(arr*255.0, dtype=np.byte), "RGB").cuda()
(don't forget to import numpy and PIL in train.py)
This should be enough! (assuming you have the frames stored somewhere and not as an npy/compressed file)