detrex icon indicating copy to clipboard operation
detrex copied to clipboard

DINO Training with Swin-small

Open ysysys666 opened this issue 1 year ago • 3 comments

Dear author, Hello. I am now training dino, the swin-s chosen by the backbone. My configuration is the same as yours, 4gpus, but my batch_size is halved to 8, so the initial learning rate is halved, but the training results are all 0. "d2.checkpoint.c2_model_loading WARNING: Shape of norm.weight in checkpoint is torch.Size([768]), while shape of necks.norm.weight in model is torch.Size([256]) " "d2.checkpoint.c2_model_loading WARNING: Shape of norm.weight in checkpoint is torch.Size([768]), while shape of transformer.decoder.norm.weight in model is torch.Size([256])"I downloaded from techches website weight directly, is this why? Please don't hesitate to enlighten me!

ysysys666 avatar Jan 10 '24 08:01 ysysys666

Hello, would u like to provide more info about your training config:

  • the training config is the same as https://github.com/IDEA-Research/detrex/blob/main/projects/dino/configs/dino-swin/dino_swin_tiny_224_4scale_12ep.py or not?
  • the pretrained swin checkpoint you use

And I think you don't have to half the batch_size and learning rate, you can use gradient_checkpoint to lower the gpu memory usage and keep the batch_size the same for training.

rentainhe avatar Jan 10 '24 09:01 rentainhe

train.init_checkpoint = "./configs/dino-swin/swin_small_patch4_window7_224_22kto1k_finetune.pth" train.output_dir = "./output/dino_swin_small_224_4scale_12ep_8bs"

train.max_iter = 180000

optimizer.lr = 5e-5

dataloader.train.total_batch_size = 8

other config is the same as "dino_swin_small_224_4scale_12ep.py"

ysysys666 avatar Jan 10 '24 12:01 ysysys666

@rentainhe excuse me,Have you encountered similar problems when loading swin-s weights 微信图片_20240111174114

ysysys666 avatar Jan 11 '24 09:01 ysysys666