Sehun Kim

Results 6 comments of Sehun Kim

Hmm, I cannot get similar scores in the paper, with the same hyperparameters in the notebook with resnet32.

Maybe "Residual connections" described in Section 4.1 is missing in the code. The paper reports that without the "Residual connections", the model was not trainable.

In my experience, training from scratch with domain-specific dataset performed better than fine-tuning the (general) pretrained model.

> [@sehunfromdaegu](https://github.com/sehunfromdaegu) but in this case the training procedure is quite slow / difficult due to the 7b model size. You can't just train the distilled models from scratch on...

> [@sehunfromdaegu](https://github.com/sehunfromdaegu) sorry just to elaborate. > > When you talk about training a model from scratch what procedure are you using? The dinov2? Dinov3? Clip? Siglip? There are so...

> I got nan during training, I think it is because I loaded the model as float16? It is very likely that (fp16 + self-attention) causes nan loss in my...