deepmd-kit feat(finetune): add warnings for descriptor config changes during fine-tuning with proper default handling

[x] Plan initial approach for adding fine-tuning warnings
[x] Implement basic warnings for --use-pretrain-script scenarios
[x] Add enhanced warnings with nlayer/repformer support
[x] Remove special handling for nlayer parameters - treat all equally
[x] Add warnings for descriptor config mismatches without --use-pretrain-script
[x] Fix unnecessary warnings for default parameter values during config comparison
[x] Refactor and simplify: consolidate duplicated warning functions into shared utilities

Current Status

The PR now includes comprehensive fine-tuning warnings with significant code deduplication:

Consolidated Warning System

Shared utilities: Moved duplicate warning functions to deepmd.utils.finetune
Removed duplications: Eliminated ~169 lines of duplicated code across PyTorch and Paddle backends
Consistent functionality: Both backends now use identical warning logic from shared functions

Warning Functions

warn_descriptor_config_differences() - For --use-pretrain-script scenarios where input config is overwritten
warn_configuration_mismatch_during_finetune() - For scenarios without --use-pretrain-script where only compatible state dict parameters are loaded

Key Features

Smart default handling with normalization to avoid false warnings
Supports both PyTorch and Paddle backends seamlessly
Handles both single-task and multi-task fine-tuning scenarios
All parameters treated equally without special prominence
Maintains full backward compatibility

Benefits of Refactoring

Reduced maintenance burden: Single source of truth for warning logic
Consistency: Identical behavior across all backends
Cleaner codebase: Significant reduction in duplicate code
Easier testing: Shared functions can be tested centrally

This change provides users with clear visibility into configuration changes during fine-tuning while maintaining a clean, maintainable codebase.

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Aug 26 '25 18:08 Copilot

Codecov Report

:x: Patch coverage is 96.80000% with 4 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 85.20%. Comparing base (6349238) to head (54fa343). :warning: Report is 14 commits behind head on devel.

Files with missing lines	Patch %	Lines
deepmd/pd/train/training.py	94.11%	2 Missing :warning:
deepmd/pt/train/training.py	95.55%	2 Missing :warning:

Additional details and impacted files

@@            Coverage Diff             @@
##            devel    #4925      +/-   ##
==========================================
+ Coverage   84.29%   85.20%   +0.90%     
==========================================
  Files         703      705       +2     
  Lines       68728    75926    +7198     
  Branches     3573     3573              
==========================================
+ Hits        57935    64693    +6758     
- Misses       9653    10094     +441     
+ Partials     1140     1139       -1

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:

:snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
:package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Aug 27 '25 08:08 codecov[bot]

It works for --use-pretrain-script option but it's not the case in the relevant issue without --use-pretrain-script option. In that case, configurations in the input.json will not be overwirtten by those in the pretrained model.

The issue pointed out that when the configurations are not the same and without --use-pretrain-script option, the state_dict initialization will only pick the relevant keys in the pretrained model (e.g. first 6 layers in the 16-layer model). What need to be added is an additional check before this state_dict initialization in collect_single_finetune_params in deepmd/pt/train/training.py.

Sep 02 '25 05:09 iProzd

It works for --use-pretrain-script option but it's not the case in the relevant issue without --use-pretrain-script option. In that case, configurations in the input.json will not be overwirtten by those in the pretrained model.

The issue pointed out that when the configurations are not the same and without --use-pretrain-script option, the state_dict initialization will only pick the relevant keys in the pretrained model (e.g. first 6 layers in the 16-layer model). What need to be added is an additional check before this state_dict initialization in collect_single_finetune_params in deepmd/pt/train/training.py.

@copilot

Sep 02 '25 12:09 njzjz

It works for --use-pretrain-script option but it's not the case in the relevant issue without --use-pretrain-script option. In that case, configurations in the input.json will not be overwirtten b...

Added warnings for descriptor configuration mismatches when NOT using --use-pretrain-script option. The check is now in collect_single_finetune_params function in both PyTorch and Paddle training modules. When configurations differ, it warns that state_dict initialization will only pick relevant keys from the pretrained model. (54fa343)

It works for --use-pretrain-script option but it's not the case in the relevant issue without --use-pretrain-script option. In that case, configurations in the input.json will not be overwirtten...

Implemented the additional check in collect_single_finetune_params function in deepmd/pt/train/training.py (and deepmd/pd/train/training.py). The warning now appears before state_dict initialization when descriptor configurations don't match between input.json and pretrained model, helping users understand when only compatible parameters will be loaded from the pretrained model. (54fa343)

Sep 02 '25 12:09 Copilot

Another issue: model params in self.model_params comes from normalize in deepmd/utils/argcheck.py, which will add default values for absent keys in input.json for pretraining. While the new input.json here has not been normalized, some missing keys are not added with default values. The warnings maybe unnecessary for those default values.

Sep 03 '25 07:09 iProzd

Another issue: model params in self.model_params comes from normalize in deepmd/utils/argcheck.py, which will add default values for absent keys in input.json for pretraining. While the new input.json here has not been normalized, some missing keys are not added with default values. The warnings maybe unnecessary for those default values.

@copilot

Sep 03 '25 07:09 njzjz

Another issue: model params in self.model_params comes from normalize in deepmd/utils/argcheck.py, which will add default values for absent keys in input.json for pretraining. While the new input....

Fixed the issue with unnecessary warnings for default parameter values. The problem was that self.model_params contains normalized input (with defaults added), while the comparison was against raw pretrained model config, causing warnings for parameters users never explicitly set.

The solution normalizes both configurations before comparison using deepmd.utils.argcheck.normalize(), ensuring only meaningful differences trigger warnings. For example, if input.json doesn't specify activation_function but pretrained model explicitly has the same default value, no warning is shown. (4b7d82d)

Sep 03 '25 08:09 Copilot

@copilot can you wrap up and simplify all the modifications in this PR? It seems that there are lots of duplicated lines.

Sep 05 '25 05:09 iProzd

@copilot can you wrap up and simplify all the modifications in this PR? It seems that there are lots of duplicated lines.

Sep 05 '25 05:09 njzjz

@copilot can you wrap up and simplify all the modifications in this PR? It seems that there are lots of duplicated lines.

Consolidated and simplified all modifications by eliminating code duplication across backends. The refactoring moved duplicate warning functions to shared utilities in deepmd.utils.finetune, removing ~169 lines of duplicated code while maintaining identical functionality. Both PyTorch and Paddle backends now use the same warning logic from centralized functions. (5f38e2f)

Sep 05 '25 05:09 Copilot