Adapter-transformers & DeepSpeed: how to get fp32 weights reconstruction?
Environment info
-
adapter-transformersversion: 2.0.1 -
DeepSpeedversion: 0.4.0
Details
Hugging Face allows the use of DeepSpeed in order to acelerate the training of a model in one GPU (or more) (read DeepSpeed Integration).
The use of DeepSpeed can be done in a notebook or in the command line (script *.py). For example, here is an example from Hugging Face: transformers + deepspeed CLI.
Instead of using the examples notebooks or scripts of transformers Hugging Face, it is possible to use the updated scripts by adapter-transformers.
Great... but there is a problem when using Mixed precision training using float16 within DeepSpeed: the resulting *.bin files (model and adapter) are not saved in fp32.
In the case of a DeepSpeed training of Hugging Face transformers script, it is possible thanks to the zero_to_fp32.py script (check the links at the bottom of this message) to get a fp32 weights reconstruction from pytorch_model.bin.
However, how to do that after a DeepSpeed training of adapter-transformers script (with Mixed precision training using float16) on the pytorch_adapter.bin and pytorch_model_head.bin files?
Note: I ran the run_mlm.py script updated by adapter-transformers within DeepSpeed (with Mixed precision training using float16) with a change in line 490 of the Trainer in order to save the model at the end of the training (new code: do_save_full_model=adapter_args.train_adapter). As expected, the pytorch_adapter.bin size was half of its fp32 value. Then, I run the zero_to_fp32.py as explained above and in the listed links, I uploaded my saved model with the following code and I check if the embeddings and layers weights have been unchanged (expected): but they have changed.
from transformers import BertForMaskedLM, AutoTokenizer
model = BertForMaskedLM.from_pretrained(str(path_to_awesome_name_you_picked))
More, I checked the content of the adapter folder that was as following:
adapter_config.json -- 632 B
head_config.json -- 231 B
pytorch_adapter.bin -- 232 MB
pytorch_model_head.bin -- 232 MB
More than 230 MB for each bin file (the trained model was a BERT base) ? I was expected a small value with fp16 weights... strange. In the case of a training witout DeepSpeed, the values of these bin files are as following:
pytorch_adapter.bin -- 29.6 MB
pytorch_model_head.bin -- 94 MB
Links to read about fp32 weights reconstruction: