Swin-Transformer
Swin-Transformer copied to clipboard
How to finetune the 384 models with window size 12?
Great works! I have some questions about fine-tuning on ImageNet 1K. In the paper, you claimed 384^2 input models are obtained by fine-tuning as also pointed by #24:
For other resolutions such as 384^2, we fine-tune the models trained at 224^2 resolution, instead of training from scratch, to reduce GPU consumption.
I see that you use window_size 12 for 384 models which makes the fine-tuning confusing because of the existence of parameters: relative_position_bias_table and attn_mask. Do you use interpolation for this issue? Which interpolation method do you use? bicubic?
Thanks for your reply in advance!
Yes, we used bicubic interpolation.
Thanks for your reply. I have some others questions about the details used in the finetuing on ImageNet 1K. In addition to hyperparameters such as learning rate, batch size, LR scheduler, and weight decay, do we also need to adjust the drop path ratio and the label smoothing ratio? Besides, should we reinitialize the weight of the fully connected classification head or inherit corresponding weights of the pre-trained 22K model's FC (since 22K contains the 1K classes).
Yes, we used bicubic interpolation.
hi how load modify the pretrained model of input size 384 to finetune the input 448 when load model ?
hi how load modify the pretrained model of input size 384 to finetune the input 448 when load model ?
bicubic also
Instructions and configs for fine-tuning on higher resolution can be found here: https://github.com/microsoft/Swin-Transformer/blob/main/get_started.md#fine-tuning-on-higher-resolution