Megatron-LM icon indicating copy to clipboard operation
Megatron-LM copied to clipboard

adjust the keys of attention in checkpoint

Open xiaojunjie opened this issue 1 year ago • 2 comments

I try to convert gpt checkpoint from local to transformer_engine according to following map { 'input_layernorm.': 'self_attention.linear_qkv.layer_norm_', 'pre_mlp_layernorm.': 'mlp.linear_fc1.layer_norm_', } It works well only when the optimizer is not loaded, because:

In local or transformer_engine checkpoint, linear_proj is in front of linear_qkv, due to the order of initialization

  • local
    • layernorm -> linear_proj -> linear_qkv
  • transformer_engine
    • linear_proj -> layernorm -> linear_qkv

For optimizer, it is necessary to swap sequential stored weights. Another method, as this pr do, build linear_qkv first, as same as the forward order

  • layernorm -> linear_qkv -> linear_proj

xiaojunjie avatar Feb 18 '24 12:02 xiaojunjie

@sudhakarsingh27 Please correct me if I'm wrong

xiaojunjie avatar Feb 19 '24 03:02 xiaojunjie

Marking as stale. No activity in 60 days.

github-actions[bot] avatar Apr 19 '24 18:04 github-actions[bot]