vggt activations and training stability

Hi, I checked out the model code and see that "exp" and "inv_log" is used in the output of the depth and point head respectively. However, when I tried to train the model, this often makes inf values since exp used is very sensitive to the input. Is there any reason to use this instead of relu like Depth Anything v2?

May 19 '25 08:05 min-hieu-netropy

Hi for the depth prediction, it is okay to use relu although we found exp generally works better. For the point cloud prediction, since point can be (-inf, inf), relu cannot work.

The training instability (inf, nan) actually comes from the DPT head rather than the activations.

May 29 '25 20:05 jytime

Hi for the depth prediction, it is okay to use relu although we found exp generally works better. For the point cloud prediction, since point can be (-inf, inf), relu cannot work.

The training instability (inf, nan) actually comes from the DPT head rather than the activations.

I also encountered the inf & nan situation when finetuning the model only with depth head + camera head directly on metric depth(the depth prediction and gt are normalized using the same scale like mast3r), if the training instability comes from the DPT, is there any solutions or advice to avoid it? :)

May 30 '25 03:05 qsisi

@min-hieu-netropy would you be open to sharing some of your training code? Would be greatly appreciated

May 30 '25 08:05 davnords

@jytime Out of curiosity, how does the "inv_log" activation perform compared to the "norm_exp" activation, which is what dust3r uses? Is there a reason to prefer one over the other?

Jul 26 '25 20:07 hanyucc