DeepSpeed
DeepSpeed copied to clipboard
Enable auto TP policy for llama model
This PR is going to enabling correct auto TP policy for llama model (https://github.com/huggingface/transformers/blob/ef61b1ba1a8ee9fd354b640b059c3474b676c0c5/src/transformers/models/llama/modeling_llama.py)
Where its MLP output linear is "down_proj" and MHA output linear is "o_proj";
Details (llama 7B model):
LlamaForCausalLM( (model): LlamaModel( (embed_tokens): Embedding(32000, 4096, padding_idx=31999) (layers): ModuleList( (0-31): 32 x LlamaDecoderLayer( (self_attn): LlamaAttention( (q_proj): Linear(in_features=4096, out_features=4096, bias=False) (k_proj): Linear(in_features=4096, out_features=4096, bias=False) (v_proj): Linear(in_features=4096, out_features=4096, bias=False) (o_proj): Linear(in_features=4096, out_features=4096, bias=False) (rotary_emb): LlamaRotaryEmbedding() ) (mlp): LlamaMLP( (gate_proj): Linear(in_features=4096, out_features=11008, bias=False) (down_proj): Linear(in_features=11008, out_features=4096, bias=False) (up_proj): Linear(in_features=4096, out_features=11008, bias=False) (act_fn): SiLUActivation() ) (input_layernorm): LlamaRMSNorm() (post_attention_layernorm): LlamaRMSNorm() ) ) (norm): LlamaRMSNorm() ) (lm_head): Linear(in_features=4096, out_features=32000, bias=False) )
@molly-smith Hi please take a review~ thanks!
cc @jgong5 @EikanWang
Does it make sense to also update docs/_tutorials/automatic-tensor-parallism.md to include this model in supported list?
Does it make sense to also update docs/_tutorials/automatic-tensor-parallism.md to include this model in supported list?
Have added "llama" to the doc. Thanks.
@microsoft-github-policy-service agree company="Intel"