gaussian-grouping icon indicating copy to clipboard operation
gaussian-grouping copied to clipboard

Training and fine-tuning are too slow

Open ThePassedWind opened this issue 1 year ago • 13 comments

To train the bear model (original loss + 2d id loss + 3d reg loss), it would cost at least 10h. To fine-tune the inpainted model, it would cost at least 4h. Device: a 3090 GPU.

ThePassedWind avatar Jan 08 '24 08:01 ThePassedWind

Hi, this training time seems a bit strange.

I have verified that our code of inpainting on the bear dataset with full resolution takes about 40 minutes. I use one A6000 GPU which has a speed similar to 3090. Here is the log inpaint_bear.log. In our paper we use A100 and less iterations, the finetuning time is even shorter.

Could you check your environment like torch and cuda version, or the cpu usage?

ymq2017 avatar Jan 08 '24 15:01 ymq2017

Hi, Thanks for open-sourcing your project!

To add to this issue, I trained Gaussian Grouping on the lerf figurines dataset, and it took 1h20 (same scene on original 3DGS takes about 20min). I'm using CUDA 12.1 with PyTorch 2.1.2 Also, rendering the 299 images after training with the 30k checkpoint takes about 15min, and the original 3DGS takes 4min.

nviolante25 avatar Jan 16 '24 10:01 nviolante25

Hi, Thanks for open-sourcing your project!

To add to this issue, I trained Gaussian Grouping on the lerf figurines dataset, and it took 1h20 (same scene on original 3DGS takes about 20min). I'm using CUDA 12.1 with PyTorch 2.1.2 Also, rendering the 299 images after training with the 30k checkpoint takes about 15min, and the original 3DGS takes 4min.

Hi, thanks for your information! We use CUDA11.3 with PyTorch 1.12.1. There are also some ways to reduce the time for training and rendering.

For training, you can increase the interval for 3d reg loss to reduce the time.

For rendering, you can turn off some visualization code like PCA to reduce the time. In rendering, original Gaussian Splatting does not have mask prediction visualization and feature PCA visualization so the total time is very short.

ymq2017 avatar Jan 16 '24 17:01 ymq2017

Hi, Thanks for open-sourcing your project! To add to this issue, I trained Gaussian Grouping on the lerf figurines dataset, and it took 1h20 (same scene on original 3DGS takes about 20min). I'm using CUDA 12.1 with PyTorch 2.1.2 Also, rendering the 299 images after training with the 30k checkpoint takes about 15min, and the original 3DGS takes 4min.

Hi, thanks for your information! We use CUDA11.3 with PyTorch 1.12.1. There are also some ways to reduce the time for training and rendering.

For training, you can increase the interval for 3d reg loss to reduce the time.

For rendering, you can turn off some visualization code like PCA to reduce the time. In rendering, original Gaussian Splatting does not have mask prediction visualization and feature PCA visualization so the total time is very short.

Thanks! I would try again!

ThePassedWind avatar Jan 17 '24 06:01 ThePassedWind

Maybe it's my GPU problem, the GPU is too old. And I have another question about training and fine-tuning stages: It is obvious that as the training goes on, why does more and more GPU storage to be allocated? At the same time, epoch N is much faster than epoch M(M>N).

ThePassedWind avatar Jan 18 '24 06:01 ThePassedWind

Maybe it's my GPU problem, the GPU is too old. And I have another question about training and fine-tuning stages: It is obvious that as the training goes on, why does more and more GPU storage to be allocated? At the same time, epoch N is much faster than epoch M(M>N).

Yes this is normal. Because Gaussian Splatting uses Adaptive Density Control and the number of points will grow in the training process. Usually 30K iters can increase the number of points by 1-2 orders of magnitude.

ymq2017 avatar Jan 23 '24 02:01 ymq2017

Hi, Thanks for open-sourcing your project!

To add to this issue, I trained Gaussian Grouping on the lerf figurines dataset, and it took 1h20 (same scene on original 3DGS takes about 20min). I'm using CUDA 12.1 with PyTorch 2.1.2 Also, rendering the 299 images after training with the 30k checkpoint takes about 15min, and the original 3DGS takes 4min.

Hi, @nviolante25 May I ask how you solved the OOM issue? I use a single 4090 and it raises the error of out of memory. Thank you!

Neal2020GitHub avatar Jan 25 '24 10:01 Neal2020GitHub

Hi @Neal2020GitHub , I haven't had oom issues, it worked directly in my case.

nviolante25 avatar Jan 26 '24 14:01 nviolante25

Hi, Thanks for open-sourcing your project! To add to this issue, I trained Gaussian Grouping on the lerf figurines dataset, and it took 1h20 (same scene on original 3DGS takes about 20min). I'm using CUDA 12.1 with PyTorch 2.1.2 Also, rendering the 299 images after training with the 30k checkpoint takes about 15min, and the original 3DGS takes 4min.

Hi, @nviolante25 May I ask how you solved the OOM issue? I use a single 4090 and it raises the error of out of memory. Thank you!

Hi, @Neal2020GitHub I also faced the same problem and I may have some ideas about it. Could we discuss about it? I'm a Msc student in PolyU(plan to apply Phd this year). You can add my Wechat: hp1499931489.

ThePassedWind avatar Jan 27 '24 14:01 ThePassedWind

3090 might not be compatible to CUDA 11.3. I've also tested on 3090 and it works like charm. So, I would suggest you to install a latest CUDA version (~11.7 or 11.8)

haoyuhsu avatar Feb 10 '24 18:02 haoyuhsu

Hi, Thanks for open-sourcing your project! To add to this issue, I trained Gaussian Grouping on the lerf figurines dataset, and it took 1h20 (same scene on original 3DGS takes about 20min). I'm using CUDA 12.1 with PyTorch 2.1.2 Also, rendering the 299 images after training with the 30k checkpoint takes about 15min, and the original 3DGS takes 4min.

Hi, @nviolante25 May I ask how you solved the OOM issue? I use a single 4090 and it raises the error of out of memory. Thank you!

I use a single 4090 to finish the training with the following tricks:

  1. raw 3DGS load all image in GPU at the dataloader init, and I load images in local memory instead. You can change the data_device to cpu in "scene/cameras.py"
  2. use torch.cuda.empty_cache() in each iteration to release CUDA memory

MisEty avatar Feb 26 '24 07:02 MisEty

Hi @nviolante25, I would like to ask if you have encountered the following error when using torch==2.1.2 and CUDA==12.1: diff_gaussian_rasterization/_C.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZN3c1021throwNullDataPtrErrorEv I am using the same versions as you, but I encountered this incompatibility. Could you please advise on how to resolve it?

SteveOUO avatar Mar 15 '25 03:03 SteveOUO

Hi @SteveOUO , I didn't run into that issue when I was using the code, so I don't know how to solve it.

nviolante25 avatar Mar 18 '25 15:03 nviolante25