DeepSpeedExamples why the model size reduced after SFT step?

why the model size reduced after SFT step?

Open xdnjust opened this issue 1 year ago • 2 comments

hello, thanks for your wonderful work. I have a question about the train stage.

I use GPT2 (117M) to finetune on my own datasets, the GPT2 model size is about 4GB. I use the following code to train a sft model:

OUTPUT_PATH=/test_gpt/pre_train_models/sft_models mkdir -p $OUTPUT_PATH

deepspeed --num_gpus 1 /test_gpt/DeepSpeedExamples/applications/DeepSpeed-Chat/training/step1_supervised_finetuning/main.py
--gradient_checkpointing
--zero_stage 0
--deepspeed

after training , I found the model size is only about 2GB (nearly half the original size)

Apr 20 '23 09:04 xdnjust

Hi, one is using FP32 format and another is using FP16 format.

Apr 24 '23 04:04 yaozhewei

thanks for your reply.

so you mean the default operator is using FP16 format to train a new model? where can I find this setting code?

very thanks!

Apr 24 '23 04:04 xdnjust

Please see here https://github.com/microsoft/DeepSpeedExamples/blob/master/applications/DeepSpeed-Chat/training/utils/ds_utils.py#L37

Apr 24 '23 18:04 yaozhewei

thanks very much!

Apr 28 '23 12:04 xdnjust

DeepSpeedExamples DeepSpeedExamples copied to clipboard

why the model size reduced after SFT step?

DeepSpeedExamples
DeepSpeedExamples copied to clipboard