DeepSpeed Enable auto TP policy for llama model

This PR is going to enabling correct auto TP policy for llama model (https://github.com/huggingface/transformers/blob/ef61b1ba1a8ee9fd354b640b059c3474b676c0c5/src/transformers/models/llama/modeling_llama.py)

Where its MLP output linear is "down_proj" and MHA output linear is "o_proj";

Details (llama 7B model):

LlamaForCausalLM( (model): LlamaModel( (embed_tokens): Embedding(32000, 4096, padding_idx=31999) (layers): ModuleList( (0-31): 32 x LlamaDecoderLayer( (self_attn): LlamaAttention( (q_proj): Linear(in_features=4096, out_features=4096, bias=False) (k_proj): Linear(in_features=4096, out_features=4096, bias=False) (v_proj): Linear(in_features=4096, out_features=4096, bias=False) (o_proj): Linear(in_features=4096, out_features=4096, bias=False) (rotary_emb): LlamaRotaryEmbedding() ) (mlp): LlamaMLP( (gate_proj): Linear(in_features=4096, out_features=11008, bias=False) (down_proj): Linear(in_features=11008, out_features=4096, bias=False) (up_proj): Linear(in_features=4096, out_features=11008, bias=False) (act_fn): SiLUActivation() ) (input_layernorm): LlamaRMSNorm() (post_attention_layernorm): LlamaRMSNorm() ) ) (norm): LlamaRMSNorm() ) (lm_head): Linear(in_features=4096, out_features=32000, bias=False) )

Apr 10 '23 03:04 jianan-gu

@molly-smith Hi please take a review~ thanks!

Apr 10 '23 03:04 jianan-gu

cc @jgong5 @EikanWang

Apr 10 '23 08:04 jianan-gu

Does it make sense to also update docs/_tutorials/automatic-tensor-parallism.md to include this model in supported list?

Apr 11 '23 00:04 delock

Does it make sense to also update docs/_tutorials/automatic-tensor-parallism.md to include this model in supported list?

Have added "llama" to the doc. Thanks.

Apr 12 '23 07:04 jianan-gu

@microsoft-github-policy-service agree company="Intel"

Apr 17 '23 06:04 jianan-gu

DeepSpeed DeepSpeed copied to clipboard

Enable auto TP policy for llama model

DeepSpeed
DeepSpeed copied to clipboard