DeepSpeedExamples When training GPT-2 with Zero-3, some parameters will be missing when saving the model

When training GPT-2 with Zero-3, some parameters will be missing when saving the model

Open koking0 opened this issue 1 year ago • 0 comments

I replaced the model in steps 1 and 2 with a GPT-2 model: IDEA-CCNL/Wenzhong-GPT2-110M.

Then use Zero-3 for training, the command is as follows:

python train.py --actor-zero-stage 3 --actor-model 110m --reward-zero-stage 3 --reward-model 110m --deployment-type single_node

But in the third step, I encountered a Missing key(s) error.

The left side is the actual parameters of the reward model, and the right side is the error reported when running in step 3, and some parameters are missing.

In the end, we located it because of the problem when Zero-3 saved the model.

def save_zero_three_model(model_ema, global_rank, save_dir, zero_stage=0):
    zero_stage_3 = (zero_stage == 3)
    os.makedirs(save_dir, exist_ok=True)
    WEIGHTS_NAME = "pytorch_model.bin"
    output_model_file = os.path.join(save_dir, WEIGHTS_NAME)

    model_to_save = model_ema.module if hasattr(model_ema, 'module') else model_ema
    if not zero_stage_3:
        if global_rank == 0:
            torch.save(model_to_save.state_dict(), output_model_file)
    else:
        output_state_dict = {}
        # for k, v in model_to_save.state_dict().items():
        for k, v in model_to_save.named_parameters():
            if hasattr(v, 'ds_id'):
                with deepspeed.zero.GatheredParameters(_z3_params_to_fetch([v]), enabled=zero_stage_3):
                    v_p = v.data.cpu()
            else:
                v_p = v.cpu()
            if global_rank == 0 and "lora" not in k:
                print(f"key: {k}")
                output_state_dict[k] = v_p
        if global_rank == 0:
            torch.save(output_state_dict, output_model_file)
        del output_state_dict

If you use model_to_save.named_parameters() to traverse model parameters, normally only parameters that can be learned and updated by the optimizer should be saved.

This place has two parameters missing in each self-attention layer.

k: transformer.wte.weight
k: transformer.wpe.weight
k: transformer.h.0.ln_1.weight
k: transformer.h.0.ln_1.bias
k: transformer.h.0.attn.c_attn.weight
k: transformer.h.0.attn.c_attn.bias
k: transformer.h.0.attn.c_proj.weight
k: transformer.h.0.attn.c_proj.bias
k: transformer.h.0.ln_2.weight
k: transformer.h.0.ln_2.bias
k: transformer.h.0.mlp.c_fc.weight
k: transformer.h.0.mlp.c_fc.bias
k: transformer.h.0.mlp.c_proj.weight
k: transformer.h.0.mlp.c_proj.bias
...

But if use model_to_save.state_dict().items() to traverse the model parameters, under normal circumstances, all parameters should be saved, including both learnable parameters and non-learnable parameters.

But I found that some parameters don't seem to be collected, and the value is empty.

How can I solve this problem? please help me.

Apr 19 '23 08:04 koking0

DeepSpeedExamples DeepSpeedExamples copied to clipboard

When training GPT-2 with Zero-3, some parameters will be missing when saving the model

DeepSpeedExamples
DeepSpeedExamples copied to clipboard