feat(finetune): add warnings for descriptor config changes during fine-tuning with proper default handling
- [x] Plan initial approach for adding fine-tuning warnings
- [x] Implement basic warnings for --use-pretrain-script scenarios
- [x] Add enhanced warnings with nlayer/repformer support
- [x] Remove special handling for nlayer parameters - treat all equally
- [x] Add warnings for descriptor config mismatches without --use-pretrain-script
- [x] Fix unnecessary warnings for default parameter values during config comparison
- [x] Refactor and simplify: consolidate duplicated warning functions into shared utilities
Current Status
The PR now includes comprehensive fine-tuning warnings with significant code deduplication:
Consolidated Warning System
- Shared utilities: Moved duplicate warning functions to
deepmd.utils.finetune - Removed duplications: Eliminated ~169 lines of duplicated code across PyTorch and Paddle backends
- Consistent functionality: Both backends now use identical warning logic from shared functions
Warning Functions
warn_descriptor_config_differences()- For --use-pretrain-script scenarios where input config is overwrittenwarn_configuration_mismatch_during_finetune()- For scenarios without --use-pretrain-script where only compatible state dict parameters are loaded
Key Features
- Smart default handling with normalization to avoid false warnings
- Supports both PyTorch and Paddle backends seamlessly
- Handles both single-task and multi-task fine-tuning scenarios
- All parameters treated equally without special prominence
- Maintains full backward compatibility
Benefits of Refactoring
- Reduced maintenance burden: Single source of truth for warning logic
- Consistency: Identical behavior across all backends
- Cleaner codebase: Significant reduction in duplicate code
- Easier testing: Shared functions can be tested centrally
This change provides users with clear visibility into configuration changes during fine-tuning while maintaining a clean, maintainable codebase.
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.
Codecov Report
:x: Patch coverage is 96.80000% with 4 lines in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 85.20%. Comparing base (6349238) to head (54fa343).
:warning: Report is 14 commits behind head on devel.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| deepmd/pd/train/training.py | 94.11% | 2 Missing :warning: |
| deepmd/pt/train/training.py | 95.55% | 2 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## devel #4925 +/- ##
==========================================
+ Coverage 84.29% 85.20% +0.90%
==========================================
Files 703 705 +2
Lines 68728 75926 +7198
Branches 3573 3573
==========================================
+ Hits 57935 64693 +6758
- Misses 9653 10094 +441
+ Partials 1140 1139 -1
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
- :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.
It works for --use-pretrain-script option but it's not the case in the relevant issue without --use-pretrain-script option. In that case, configurations in the input.json will not be overwirtten by those in the pretrained model.
The issue pointed out that when the configurations are not the same and without --use-pretrain-script option, the state_dict initialization will only pick the relevant keys in the pretrained model (e.g. first 6 layers in the 16-layer model). What need to be added is an additional check before this state_dict initialization in collect_single_finetune_params in deepmd/pt/train/training.py.
It works for
--use-pretrain-scriptoption but it's not the case in the relevant issue without--use-pretrain-scriptoption. In that case, configurations in the input.json will not be overwirtten by those in the pretrained model.The issue pointed out that when the configurations are not the same and without
--use-pretrain-scriptoption, the state_dict initialization will only pick the relevant keys in the pretrained model (e.g. first 6 layers in the 16-layer model). What need to be added is an additional check before this state_dict initialization incollect_single_finetune_paramsindeepmd/pt/train/training.py.
@copilot
It works for
--use-pretrain-scriptoption but it's not the case in the relevant issue without--use-pretrain-scriptoption. In that case, configurations in the input.json will not be overwirtten b...
Added warnings for descriptor configuration mismatches when NOT using --use-pretrain-script option. The check is now in collect_single_finetune_params function in both PyTorch and Paddle training modules. When configurations differ, it warns that state_dict initialization will only pick relevant keys from the pretrained model. (54fa343)
It works for
--use-pretrain-scriptoption but it's not the case in the relevant issue without--use-pretrain-scriptoption. In that case, configurations in the input.json will not be overwirtten...
Implemented the additional check in collect_single_finetune_params function in deepmd/pt/train/training.py (and deepmd/pd/train/training.py). The warning now appears before state_dict initialization when descriptor configurations don't match between input.json and pretrained model, helping users understand when only compatible parameters will be loaded from the pretrained model. (54fa343)
Another issue: model params in self.model_params comes from normalize in deepmd/utils/argcheck.py, which will add default values for absent keys in input.json for pretraining. While the new input.json here has not been normalized, some missing keys are not added with default values. The warnings maybe unnecessary for those default values.
Another issue: model params in
self.model_paramscomes fromnormalizein deepmd/utils/argcheck.py, which will add default values for absent keys in input.json for pretraining. While the new input.json here has not been normalized, some missing keys are not added with default values. The warnings maybe unnecessary for those default values.
@copilot
Another issue: model params in
self.model_paramscomes fromnormalizein deepmd/utils/argcheck.py, which will add default values for absent keys in input.json for pretraining. While the new input....
Fixed the issue with unnecessary warnings for default parameter values. The problem was that self.model_params contains normalized input (with defaults added), while the comparison was against raw pretrained model config, causing warnings for parameters users never explicitly set.
The solution normalizes both configurations before comparison using deepmd.utils.argcheck.normalize(), ensuring only meaningful differences trigger warnings. For example, if input.json doesn't specify activation_function but pretrained model explicitly has the same default value, no warning is shown. (4b7d82d)
@copilot can you wrap up and simplify all the modifications in this PR? It seems that there are lots of duplicated lines.
@copilot can you wrap up and simplify all the modifications in this PR? It seems that there are lots of duplicated lines.
@copilot can you wrap up and simplify all the modifications in this PR? It seems that there are lots of duplicated lines.
Consolidated and simplified all modifications by eliminating code duplication across backends. The refactoring moved duplicate warning functions to shared utilities in deepmd.utils.finetune, removing ~169 lines of duplicated code while maintaining identical functionality. Both PyTorch and Paddle backends now use the same warning logic from centralized functions. (5f38e2f)