pytorch-image-models icon indicating copy to clipboard operation
pytorch-image-models copied to clipboard

a problem about pre-trained weights.

Open chenyanting1 opened this issue 6 months ago • 6 comments

I found a problem about pre-trained weights. When I trained with a custom model and put it into the same model with the same head, when I used --pretrained-path to load the pre-trained weights,the epoch 0 is 8%, I found that the weights were not loaded, but using --initial-checkpoint could load the weights, the epoch is 80%

chenyanting1 avatar Jun 18 '25 15:06 chenyanting1

@chenyanting1 if the weights specified by --pretrained-path differ from the pretrained entry for that model being used in num-classes the head will be reset as it would normally be fine-tuning that model... where as --initial-checkpoint will load the weights into the model with the reset head based on the num-classes arg. --pretrained-path is for fine-tuning models with different pretrained-weights and adapting the head, --initial-checkpoint is for loading final weights in to model with same head layout

rwightman avatar Jun 18 '25 16:06 rwightman

Simplified view

--pretrained-path

  • init model
  • load weights
  • reset / adapt head & input conv

--initial-checkpoint

  • init model w/ specified head
  • load weights

rwightman avatar Jun 18 '25 16:06 rwightman

@rwightman I understand what you mean, but I will show you the results. The results now show that using --pretrained-path to load the model weights, the pre-trained weights and the training on the same model, the same dataset, and the same training head have very low results in the first training, which means that the instruction may have reset the training head. In my previous experiments, on the same dataset, pre-training often improves the accuracy of the model, but when I use the timm library for training, using pre-trained weights, the accuracy is likely to drop .

chenyanting1 avatar Jun 19 '25 00:06 chenyanting1

Image

Image

Image

Image

chenyanting1 avatar Jun 19 '25 00:06 chenyanting1

You can see that I only changed the way the weights are loaded and the GPU that runs them. The classification head is the same, and --initial-checkpoint has an accuracy of 82% in the first round of training, indicating that there is no problem with the weights.

chenyanting1 avatar Jun 19 '25 00:06 chenyanting1

But in the --pretrained-path case the head is reset, so you have to retrain the head, and you're using a warmup-lr of 1e-6, lr scheduling is per epoch so first epoch won't be learning very quickly

rwightman avatar Jun 19 '25 14:06 rwightman

@rwightman My understanding is that no matter whether the fully connected layer of the model is matched or not, --pretrained-path will definitely reset the fully connected layer. At the same time, because of pre-training, I may need to spend some time adjusting the training parameters, which cannot be the same as normal training parameters. Is my understanding correct?

chenyanting1 avatar Jun 20 '25 07:06 chenyanting1

in any case, to see if this is a 'bug', when you use --initial-checkpoint, also add a model.reset_classifier(num_classes=arg.num_classes) in train.py, after model creation, and see if it trains similarly to --pretrained-path

rwightman avatar Jun 20 '25 14:06 rwightman

I encountered the same bug here. When loading local pretrained checkpoints with --pretrained-path to do finetuning (removing head), the rest of the model is still not being initialized with the pretrained ckp. Therefore, I tried --initial-checkpoint and manually removed the head, and then it worked.

K1seki221 avatar Jul 08 '25 16:07 K1seki221

FWIW if you're using --pretrained-path, you still need to set --pretrained to indicate you're loading pretrained weights, the path arg just overrides the default pretrained-path in config

rwightman avatar Jul 08 '25 17:07 rwightman

FWIW if you're using --pretrained-path, you still need to set --pretrained to indicate you're loading pretrained weights, the path arg just overrides the default pretrained-path in config

Thanks a lot for the clarification!

K1seki221 avatar Jul 08 '25 17:07 K1seki221

FWIW if you're using --pretrained-path, you still need to set --pretrained to indicate you're loading pretrained weights, the path arg just overrides the default pretrained-path in config

Thanks a lot for the clarification!

你好可以分享一下你的预训练和目标数据集训练的脚本吗,我在timm导入预训练脚本往往精度会下降

chenyanting1 avatar Sep 15 '25 08:09 chenyanting1