mmpretrain icon indicating copy to clipboard operation
mmpretrain copied to clipboard

[Bug] Revise the _remove_state_dict_prefix and _add_state_dict_prefix functions in timm.py to adapt to the case of multiple submodels.

Open wilxy opened this issue 2 years ago • 5 comments

When using TimmClassifier as student or teacher model in Knowledge Distillation Algorithms, there have some bugs in save_checkpoint and load_checkpoint.

  1. save_checkpoint When saving checkpoint like save_checkpoint(self.state_dict(), 'xxx.pth'), where self is a Knowledge Distillation Algorithm which contains submodels self.student and self.teacher, self.state_dict() will recursively call the state_dict function here. The _remove_state_dict_prefix function in the TimmClassifier class will be used as a hook to modify the original destination. Specifically, the _remove_state_dict_prefix function creates a new_state_dict whose memory is different from the original destination as the hook_result to modify the original destination for submodels student and teacher. But the state_dict funtion of the Knowledge Distillation Algorithm Model will not receive this modify, so the memory address and value of destination have not changed. To solve this problem, we change the _remove_state_dict_prefix function to modify the state_dict directly instead of creating a new_state_dict.

  2. load_checkpoint When loading checkpoint of a Knowledge Distillation Algorithm Model whose student and teacher are all TimmClassifier. The _add_state_dict_prefix function in the TimmClassifier class will be used as a hook to modify the state_dict of each submodel. When modifying the student submodel, _add_state_dict_prefix function will delete all keys of teacher submodel. To solve this problem, we change the _add_state_dict_prefix function to only delete the key that different from its new_key.

wilxy avatar Jan 04 '23 08:01 wilxy

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Jan 04 '23 08:01 CLAassistant

please sign the CLA so that I can review your PR.

Ezra-Yu avatar Jan 04 '23 09:01 Ezra-Yu

Hello, can you sign the CLA and fix the lint problem? Then we can merge the PR. @wilxy

mzr1996 avatar Jan 09 '23 03:01 mzr1996

Hello, can you sign the CLA and fix the lint problem? Then we can merge the PR. @wilxy

Thanks for the reminder, I've signed the CLA and fixed the lint problem.

wilxy avatar Jan 10 '23 06:01 wilxy

Hi @wilxy , Can you migrate this PR to the main branch?

Ezra-Yu avatar May 06 '23 07:05 Ezra-Yu