detrex
detrex copied to clipboard
DINO Training with Swin-small
Dear author, Hello. I am now training dino, the swin-s chosen by the backbone. My configuration is the same as yours, 4gpus, but my batch_size is halved to 8, so the initial learning rate is halved, but the training results are all 0. "d2.checkpoint.c2_model_loading WARNING: Shape of norm.weight in checkpoint is torch.Size([768]), while shape of necks.norm.weight in model is torch.Size([256]) " "d2.checkpoint.c2_model_loading WARNING: Shape of norm.weight in checkpoint is torch.Size([768]), while shape of transformer.decoder.norm.weight in model is torch.Size([256])"I downloaded from techches website weight directly, is this why? Please don't hesitate to enlighten me!
Hello, would u like to provide more info about your training config:
- the training config is the same as https://github.com/IDEA-Research/detrex/blob/main/projects/dino/configs/dino-swin/dino_swin_tiny_224_4scale_12ep.py or not?
- the pretrained swin checkpoint you use
And I think you don't have to half the batch_size and learning rate, you can use gradient_checkpoint to lower the gpu memory usage and keep the batch_size the same for training.
train.init_checkpoint = "./configs/dino-swin/swin_small_patch4_window7_224_22kto1k_finetune.pth" train.output_dir = "./output/dino_swin_small_224_4scale_12ep_8bs"
train.max_iter = 180000
optimizer.lr = 5e-5
dataloader.train.total_batch_size = 8
other config is the same as "dino_swin_small_224_4scale_12ep.py"
@rentainhe excuse me,Have you encountered similar problems when loading swin-s weights