Loss doesn't go down.

Open Emibobo opened this issue 1 year ago • 1 comments

Take 5s video segments form hundreds of videos, each 5s video segment takes 10 frames of images to train DINOv2 from the beginning, the input tensor shape of the model is [B,3,10,H,W], the batchsize is set to 12, run on 4 A100 (80GB) GPUs, the training parameter defaults to /configs/train/ vitl14.yaml.

Aug 07 '24 15:08 Emibobo

Same here, did you finally solve that issue?

May 29 '25 08:05 Shiyao-Xu