accelerate
accelerate copied to clipboard
save_state removes shared weights but load_state cannot load properly
System Info
accelerate version: 0.27.2
python: 3.11
Information
- [ ] The official example scripts
- [X] My own modified scripts
Tasks
- [ ] One of the scripts in the examples/ folder of Accelerate or an officially supported
no_trainerscript in theexamplesfolder of thetransformersrepo (such asrun_no_trainer_glue.py) - [X] My own task or dataset (give details below)
Reproduction
I am saving the state_dict of an 'facebook/opt-125m' model. In this model the weights are shared between embedding tokens and the language modelling head. When I am saving the state dictionary of the model, I see this warning:
WARNING: Removed shared tensor {'pretrained_model.lm_head.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading
The problem is when I want to load_state the same object, there is an error that:
Missing key(s) in state_dict: "pretrained_model.lm_head.weight".
I do understand that because the weights are shared they are removed, but I don't understand how can I work with models which have shared weights then?
Interesting thing is, the code was working with the previous versions of the libraries. Unfortunately, I don't have the old environment to tell you exactly where things break.
Thanks in advance.
Expected behavior
I expected the save_state and load_state to be able to restore the original model no matter what. This does not work.
cc @SunMarc
Hi @MiladInk, thanks for the report. Could you share a minimal reproducer ? When we load a model with shared weights, we make sure to tie the shared weights together.
I am facing a similar issue when trying to load and save “google/gemma-2b”
Hi @raghavgarg97, could you share a minimal reproducer ?
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.