DeepSpeedExamples
DeepSpeedExamples copied to clipboard
why the model size reduced after SFT step?
hello, thanks for your wonderful work. I have a question about the train stage.
I use GPT2 (117M) to finetune on my own datasets, the GPT2 model size is about 4GB. I use the following code to train a sft model:
OUTPUT_PATH=/test_gpt/pre_train_models/sft_models mkdir -p $OUTPUT_PATH
deepspeed --num_gpus 1 /test_gpt/DeepSpeedExamples/applications/DeepSpeed-Chat/training/step1_supervised_finetuning/main.py
--gradient_checkpointing
--zero_stage 0
--deepspeed
after training , I found the model size is only about 2GB (nearly half the original size)
Hi, one is using FP32 format and another is using FP16 format.
thanks for your reply.
so you mean the default operator is using FP16 format to train a new model? where can I find this setting code?
very thanks!
Please see here https://github.com/microsoft/DeepSpeedExamples/blob/master/applications/DeepSpeed-Chat/training/utils/ds_utils.py#L37
thanks very much!