a problem about pre-trained weights.
I found a problem about pre-trained weights. When I trained with a custom model and put it into the same model with the same head, when I used --pretrained-path to load the pre-trained weights,the epoch 0 is 8%, I found that the weights were not loaded, but using --initial-checkpoint could load the weights, the epoch is 80%
@chenyanting1 if the weights specified by --pretrained-path differ from the pretrained entry for that model being used in num-classes the head will be reset as it would normally be fine-tuning that model... where as --initial-checkpoint will load the weights into the model with the reset head based on the num-classes arg. --pretrained-path is for fine-tuning models with different pretrained-weights and adapting the head, --initial-checkpoint is for loading final weights in to model with same head layout
Simplified view
--pretrained-path
- init model
- load weights
- reset / adapt head & input conv
--initial-checkpoint
- init model w/ specified head
- load weights
@rwightman I understand what you mean, but I will show you the results. The results now show that using --pretrained-path to load the model weights, the pre-trained weights and the training on the same model, the same dataset, and the same training head have very low results in the first training, which means that the instruction may have reset the training head. In my previous experiments, on the same dataset, pre-training often improves the accuracy of the model, but when I use the timm library for training, using pre-trained weights, the accuracy is likely to drop .
You can see that I only changed the way the weights are loaded and the GPU that runs them. The classification head is the same, and --initial-checkpoint has an accuracy of 82% in the first round of training, indicating that there is no problem with the weights.
But in the --pretrained-path case the head is reset, so you have to retrain the head, and you're using a warmup-lr of 1e-6, lr scheduling is per epoch so first epoch won't be learning very quickly
@rwightman My understanding is that no matter whether the fully connected layer of the model is matched or not, --pretrained-path will definitely reset the fully connected layer. At the same time, because of pre-training, I may need to spend some time adjusting the training parameters, which cannot be the same as normal training parameters. Is my understanding correct?
in any case, to see if this is a 'bug', when you use --initial-checkpoint, also add a model.reset_classifier(num_classes=arg.num_classes) in train.py, after model creation, and see if it trains similarly to --pretrained-path
I encountered the same bug here. When loading local pretrained checkpoints with --pretrained-path to do finetuning (removing head), the rest of the model is still not being initialized with the pretrained ckp. Therefore, I tried --initial-checkpoint and manually removed the head, and then it worked.
FWIW if you're using --pretrained-path, you still need to set --pretrained to indicate you're loading pretrained weights, the path arg just overrides the default pretrained-path in config
FWIW if you're using --pretrained-path, you still need to set --pretrained to indicate you're loading pretrained weights, the path arg just overrides the default pretrained-path in config
Thanks a lot for the clarification!
FWIW if you're using --pretrained-path, you still need to set --pretrained to indicate you're loading pretrained weights, the path arg just overrides the default pretrained-path in config
Thanks a lot for the clarification!
你好可以分享一下你的预训练和目标数据集训练的脚本吗,我在timm导入预训练脚本往往精度会下降