deepmd-kit icon indicating copy to clipboard operation
deepmd-kit copied to clipboard

feat(pt): Add support for finetuning from .pth (frozen) models

Open Copilot opened this issue 6 months ago • 0 comments

This PR fixes the NotImplementedError that occurs when attempting to finetune from .pth (frozen/scripted) models in PyTorch backend.

Problem

Users encountered a RuntimeError when trying to finetune from frozen models:

dp --pt train input.json -t dpa2.pth --use-pretrain-script

The error occurred because the get_finetune_rules() function unconditionally used torch.load() with weights_only=True to load finetune models, which fails for .pth files that are created with torch.jit.save() and require torch.jit.load().

Solution

Updated get_finetune_rules() function in deepmd/pt/utils/finetune.py:

  • Added file extension detection to use appropriate loading method
  • .pt files: torch.load() with weights_only=True (existing behavior)
  • .pth files: torch.jit.load() and extract model params via get_model_def_script()

Updated training logic in deepmd/pt/train/training.py:

  • Added proper .pth support in model resuming/loading logic
  • Used strict=False when loading state dict from .pth files to handle different key structures
  • Gracefully handle missing optimizer state and step info in frozen models

The implementation follows the existing pattern used in the change_bias() function, ensuring consistency across the codebase.

Testing

  • Added comprehensive test cases covering both .pt and .pth finetune workflows
  • Verified backward compatibility with existing .pt finetune functionality
  • Tested error handling for invalid file extensions
  • Manual CLI testing confirms end-to-end workflow works correctly

Users can now successfully finetune from both checkpoint (.pt) and frozen (.pth) models:

# Works with checkpoint files (existing functionality)
dp --pt train input.json --finetune model.pt --use-pretrain-script

# Now works with frozen models (new functionality)  
dp --pt train input.json --finetune model.pth --use-pretrain-script

Fixes #4262.


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot avatar Sep 03 '25 14:09 Copilot