feat(pt): Add support for finetuning from .pth (frozen) models

Open Copilot opened this issue 6 months ago • 0 comments

This PR fixes the NotImplementedError that occurs when attempting to finetune from .pth (frozen/scripted) models in PyTorch backend.

Problem

Users encountered a RuntimeError when trying to finetune from frozen models:

dp --pt train input.json -t dpa2.pth --use-pretrain-script

The error occurred because the get_finetune_rules() function unconditionally used torch.load() with weights_only=True to load finetune models, which fails for .pth files that are created with torch.jit.save() and require torch.jit.load().

Solution

Updated get_finetune_rules() function in deepmd/pt/utils/finetune.py:

Added file extension detection to use appropriate loading method
.pt files: torch.load() with weights_only=True (existing behavior)
.pth files: torch.jit.load() and extract model params via get_model_def_script()

Updated training logic in deepmd/pt/train/training.py:

Added proper .pth support in model resuming/loading logic
Used strict=False when loading state dict from .pth files to handle different key structures
Gracefully handle missing optimizer state and step info in frozen models

The implementation follows the existing pattern used in the change_bias() function, ensuring consistency across the codebase.

Testing

Added comprehensive test cases covering both .pt and .pth finetune workflows
Verified backward compatibility with existing .pt finetune functionality
Tested error handling for invalid file extensions
Manual CLI testing confirms end-to-end workflow works correctly

Users can now successfully finetune from both checkpoint (.pt) and frozen (.pth) models:

# Works with checkpoint files (existing functionality)
dp --pt train input.json --finetune model.pt --use-pretrain-script

# Now works with frozen models (new functionality)  
dp --pt train input.json --finetune model.pth --use-pretrain-script

Fixes #4262.

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Sep 03 '25 14:09 Copilot