Question about fine-tune CellViT
When I try to fine-tune the CellViT-256-x40.pth model using the ViT256 backbone and the provided pre-trained encoder (vit256_small_dino.pth), I encounter a RuntimeError due to mismatched keys in the model head.
backbone: ViT256 pretrained_encoder: /mnt/raid/zanzhuheng/working/ESCC_CELLVIT/vit256_small_dino.pth pretrained: ../CellViT-256-x40.pth
Error Traceback: Loading checkpoint: _IncompatibleKeys(missing_keys=['head.weight', 'head.bias'], unexpected_keys=['head.mlp.0.weight', 'head.mlp.0.bias', 'head.mlp.2.weight', 'head.mlp.2.bias', 'head.mlp.4.weight', 'head.mlp.4.bias', 'head.last_layer.weight_g', 'head.last_layer.weight_v'])
Expected behavior I expected the model to load the pre-trained CellViT-256-x40.pth checkpoint successfully and continue training (fine-tuning) with the ViT256 backbone. It seems like the current model definition uses a single-layer head (e.g., nn.Linear), while the checkpoint uses a more complex multi-layer MLP head. The structure mismatch causes the weight loading to fail. Should the model config or code be updated to match the checkpoint structure (with mlp layers), I initialize only part of the model using strict=False and got all nan pred
I'm happy to provide more details if needed. Thank you!