gaussian-splatting icon indicating copy to clipboard operation
gaussian-splatting copied to clipboard

Training crashes after 7000

Open nivibilla opened this issue 1 year ago • 5 comments

hi,

it gets to 7000 steps, outputs ^C. Doesn't save a point cloud either

image

nivibilla avatar Sep 24 '23 09:09 nivibilla

I guess, it could be OOM.

XinyueZ avatar Sep 28 '23 14:09 XinyueZ

I also get this, not sure if OOM. It seems to only happen around the time iter 7000 saves.

I was even running it in a docker container and it crashed the host machine.

@nivibilla did you ever figure out what was causing this?

jerome3o avatar Oct 23 '23 21:10 jerome3o

Similar issue. showed killed after 7000 iterations for db/drjohnson But tandt/train works fine.

jpeng2012 avatar Dec 04 '23 06:12 jpeng2012

I had the same error and it was due to OOM. When saving Gaussians there is a spike in CPU RAM usage. You're training with 423 images in Colab so I'm guessing the RAM consumption was already high. When it tried to save Gaussians, consumption must have spiked and caused an OOM.

The quick fix is to not save Gaussians at iteration 7000 and avoid the spike. Only save at 30,000 iterations (or whatever your last iteration is) using the --save_iterations 30000 argument to train.py. However the spike caused when saving at 30,000 may cause it to fail.

The better fix is given in #667. It decreases CPU RAM consumption and prevents this.

GaneshBannur avatar Feb 21 '24 11:02 GaneshBannur

I also ran into the same issue. In my case it was not OOM-related, though. I was able to solve the problem by changing the line

elements[:] = list(map(tuple, attributes))

of save_ply in scene/gaussian_model.py, to an explicit loop:

for i in range(len(elements)):
    elements[i] = tuple(attributes[i])

.

LaurensDiels avatar Mar 06 '24 13:03 LaurensDiels