vggt Training loss first decrease then increase

Hi, thanks for this excellent work.

I am trying to finetune vggt on my custom dataset, and i have three questions.

I found the training loss first decrease then increase, is that normal?

The loss in tensorboard seems not very clear about the model converge status.

The batch loss Loss/train_loss_objective would be negative with the confidence setting, how to figure out whether the training coverage is good enough?

I tried to infer from my finetuning result on example data, but it seems performance degrade even though loss decreases.

The GPU memory usage seems to increase(another experiment) and tend to cause OOM on A100, how can I solve it?

INFO 2025-09-12 18:15:41,327 general.py: 117: Train Epoch: [0][    270/1000000] | Batch Time: 7.1827 (8.5635) | Data Time: 0.0378 (0.0897) | Mem (GB): 75.0000 (72.4391) | Time Elapsed: 00d 00h 39m | Loss/train_loss_objective: -0.5543 (0.0862) | Loss/train_loss_camera: 0.0073 (0.0619) | Loss/train_loss_T: 0.0033 (0.0345) | Loss/train_loss_R: 0.0011 (0.0128) | Loss/train_loss_FL: 0.0058 (0.0291) | Loss/train_loss_conf_depth: -0.5935 (-0.0643) | Loss/train_loss_reg_depth: 0.0075 (0.0834) | Loss/train_loss_grad_depth: 0.0026 (0.0142) | Grad/aggregator: 24.5968 (48.9545) | Grad/depth: 27.7247 (15.4777) | Grad/camera: 0.5437 (0.4361) | Grad/point: 113.9784 (111.4994)
INFO 2025-09-12 18:15:50,553 general.py: 117: Train Epoch: [0][    271/1000000] | Batch Time: 9.2262 (8.5659) | Data Time: 0.0345 (0.0895) | Mem (GB): 75.0000 (72.4485) | Time Elapsed: 00d 00h 39m | Loss/train_loss_objective: -0.0701 (0.0857) | Loss/train_loss_camera: 0.0496 (0.0618) | Loss/train_loss_T: 0.0314 (0.0345) | Loss/train_loss_R: 0.0024 (0.0128) | Loss/train_loss_FL: 0.0317 (0.0291) | Loss/train_loss_conf_depth: -0.0360 (-0.0646) | Loss/train_loss_reg_depth: 0.1406 (0.0834) | Loss/train_loss_grad_depth: 0.0058 (0.0142) | Grad/aggregator: 111.3372 (49.1838) | Grad/depth: 26.9166 (15.5197) | Grad/camera: 0.4601 (0.4362) | Grad/point: 249.7874 (112.0078)
INFO 2025-09-12 18:16:01,955 general.py: 117: Train Epoch: [0][    272/1000000] | Batch Time: 11.4019 (8.5763) | Data Time: 0.0344 (0.0893) | Mem (GB): 75.0000 (72.4579) | Time Elapsed: 00d 00h 39m | Loss/train_loss_objective: -0.4451 (0.0849) | Loss/train_loss_camera: 0.0118 (0.0617) | Loss/train_loss_T: 0.0063 (0.0344) | Loss/train_loss_R: 0.0013 (0.0128) | Loss/train_loss_FL: 0.0084 (0.0291) | Loss/train_loss_conf_depth: -0.4438 (-0.0652) | Loss/train_loss_reg_depth: 0.0141 (0.0832) | Loss/train_loss_grad_depth: 0.0034 (0.0142) | Grad/aggregator: 46.0385 (49.1723) | Grad/depth: 56.3996 (15.6695) | Grad/camera: 0.5516 (0.4366) | Grad/point: 353.5652 (112.8926)

Thanks for any suggestions!

Sep 12 '25 10:09 Learningm

The loss curve largely depends on your training set. While, I would suggest visualizing using a higher smoothness value in tensorboard.
Yeah the loss should be negative, this is expected. Ideally the total loss should converge to a value like -0.5.
Your log shows the memory is consistent at 75 GB right?

Sep 15 '25 12:09 jytime

I also got same issue, loss decreases but the inference give me degraded results even though loss decreases.

Sep 17 '25 03:09 SamiraJahangiri

The loss curve largely depends on your training set. While, I would suggest visualizing using a higher smoothness value in tensorboard.

Yeah the loss should be negative, this is expected. Ideally the total loss should converge to a value like -0.5.

Your log shows the memory is consistent at 75 GB right?

@jytime Thanks for the reply. After tuning some hyperparamters and spending more time on the experiment, the training loss and validation loss can decrease and converge slowly without OOM.

I want to ask another question, my camera prediction loss (R & T & camera loss) seems to be low in the validation step, but the camera seems not accurate on my test sample.

The sample below is the input of four cameras of same object, front / back / left/ right four views, as you can see, the visible camera position is not correct. How to improve the camera accuracy?
For the preprocessing step, I remove background of the multiple input image, remaining the central object, which is same as my object-level dataset setting.

Thanks for any suggestions !

Sep 28 '25 15:09 Learningm

The loss curve largely depends on your training set. While, I would suggest visualizing using a higher smoothness value in tensorboard.

Yeah the loss should be negative, this is expected. Ideally the total loss should converge to a value like -0.5.

Your log shows the memory is consistent at 75 GB right?

Hello, I would like to ask if I should set the smoothing value very high to take a look, like 0.99 or 0.999?

Oct 16 '25 02:10 Shexiaox