feat(pt): Add support for finetuning from .pth (frozen) models
This PR fixes the NotImplementedError that occurs when attempting to finetune from .pth (frozen/scripted) models in PyTorch backend.
Problem
Users encountered a RuntimeError when trying to finetune from frozen models:
dp --pt train input.json -t dpa2.pth --use-pretrain-script
The error occurred because the get_finetune_rules() function unconditionally used torch.load() with weights_only=True to load finetune models, which fails for .pth files that are created with torch.jit.save() and require torch.jit.load().
Solution
Updated get_finetune_rules() function in deepmd/pt/utils/finetune.py:
- Added file extension detection to use appropriate loading method
-
.ptfiles:torch.load()withweights_only=True(existing behavior) -
.pthfiles:torch.jit.load()and extract model params viaget_model_def_script()
Updated training logic in deepmd/pt/train/training.py:
- Added proper
.pthsupport in model resuming/loading logic - Used
strict=Falsewhen loading state dict from.pthfiles to handle different key structures - Gracefully handle missing optimizer state and step info in frozen models
The implementation follows the existing pattern used in the change_bias() function, ensuring consistency across the codebase.
Testing
- Added comprehensive test cases covering both
.ptand.pthfinetune workflows - Verified backward compatibility with existing
.ptfinetune functionality - Tested error handling for invalid file extensions
- Manual CLI testing confirms end-to-end workflow works correctly
Users can now successfully finetune from both checkpoint (.pt) and frozen (.pth) models:
# Works with checkpoint files (existing functionality)
dp --pt train input.json --finetune model.pt --use-pretrain-script
# Now works with frozen models (new functionality)
dp --pt train input.json --finetune model.pth --use-pretrain-script
Fixes #4262.
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.