vggt icon indicating copy to clipboard operation
vggt copied to clipboard

activations and training stability

Open min-hieu-netropy opened this issue 7 months ago • 3 comments

Hi, I checked out the model code and see that "exp" and "inv_log" is used in the output of the depth and point head respectively. However, when I tried to train the model, this often makes inf values since exp used is very sensitive to the input. Is there any reason to use this instead of relu like Depth Anything v2?

min-hieu-netropy avatar May 19 '25 08:05 min-hieu-netropy

Hi for the depth prediction, it is okay to use relu although we found exp generally works better. For the point cloud prediction, since point can be (-inf, inf), relu cannot work.

The training instability (inf, nan) actually comes from the DPT head rather than the activations.

jytime avatar May 29 '25 20:05 jytime

Hi for the depth prediction, it is okay to use relu although we found exp generally works better. For the point cloud prediction, since point can be (-inf, inf), relu cannot work.

The training instability (inf, nan) actually comes from the DPT head rather than the activations.

I also encountered the inf & nan situation when finetuning the model only with depth head + camera head directly on metric depth(the depth prediction and gt are normalized using the same scale like mast3r), if the training instability comes from the DPT, is there any solutions or advice to avoid it? :)

qsisi avatar May 30 '25 03:05 qsisi

@min-hieu-netropy would you be open to sharing some of your training code? Would be greatly appreciated

davnords avatar May 30 '25 08:05 davnords

@jytime Out of curiosity, how does the "inv_log" activation perform compared to the "norm_exp" activation, which is what dust3r uses? Is there a reason to prefer one over the other?

hanyucc avatar Jul 26 '25 20:07 hanyucc