vggt icon indicating copy to clipboard operation
vggt copied to clipboard

Questions about training stability

Open kevinchiu19 opened this issue 6 months ago • 3 comments

Hi, thank you for your great work!

I tried to add 3DGS supervised training (feature extracted by dpt head and MLP to extract Gaussian attributes), and found that the training is not particularly stable and nan is prone to appear.

  1. I would like to ask if this is because the dpt head structure itself is unstable in training?

  2. In addition, I found that the loss code released by the training branch has a check for nan/inf. (https://github.com/facebookresearch/vggt/blob/239fa974affa95ebd494778a065ae4fe58f9d8c0/training/loss_dirty.py#L18). I would like to ask if this method can solve the problem of training stability?

kevinchiu19 avatar Jun 24 '25 07:06 kevinchiu19

Hi,

  1. Yes, DPT can indeed produce very large gradients
  2. Based on discussions with others working on similar pipelines, successful training seems to depend heavily on well-tuned hyperparameters. Some of them have managed to achieve stable feedforward reconstruction with GS prediction.

jytime avatar Jun 24 '25 23:06 jytime

Thank you for your quick reply! I also saw that many people have achieved gs prediction. I myself have also tried to adjust some experimental configurations to make the network converge (but it is still not stable in most cases).

I want to confirm again, if I start training from scratch, will the check for nan/inf here help the stability of training? Or are there other ways to improve stability?

Thank you!

kevinchiu19 avatar Jun 25 '25 01:06 kevinchiu19

I used check_and_fix_inf_nan() mostly to detect nan/inf. It can help but cannot totally solve the problem.

jytime avatar Jun 25 '25 01:06 jytime

OK~ Got it, thank you for your answer.

kevinchiu19 avatar Jun 25 '25 03:06 kevinchiu19

Could you share any work that achieves stable feedforward reconstruction with GS prediction? I’m a beginner, and I want to use VGG to output 3D Gaussian parameters, but I have no idea how to achieve it. @kevinchiu19 @jytime

flyyyyer avatar Jun 26 '25 08:06 flyyyyer

你能分享一些通过 GS 预测实现稳定前馈重建的工作吗?我是初学者,想用 VGG 输出 3D 高斯参数,但不知道具体怎么做。 @kevinchiu19 @jytime

Maybe you can start with this paper: (GS-LRM) https://arxiv.org/pdf/2404.19702 , which is the most representative work of Pixel-aligned gaussian.

And the latest work based on vggt expansion is: (AnySplat) https://arxiv.org/pdf/2505.23716 , you can also refer to it.

kevinchiu19 avatar Jun 26 '25 11:06 kevinchiu19

Thank you for your suggestion! I have read about AnySplat, but it is not open-sourced. Are you attempting to reproduce this work using a self-supervised approach, similar to theirs? I also tried adding a Gaussian Head to VGGT, but I do not know how to supervise it. @kevinchiu19

flyyyyer avatar Jun 27 '25 12:06 flyyyyer