DeepSpeed [BUG] Error happened when running step3_rlhf_finetuning in enable_hybrid_engine mode with togethercomputer/GPT-NeoXT-Chat-Base-20B

[BUG] Error happened when running step3_rlhf_finetuning in enable_hybrid_engine mode with togethercomputer/GPT-NeoXT-Chat-Base-20B

Open GxjGit opened this issue 1 year ago • 0 comments

I have reported an issue in DeepSpeedExamples: https://github.com/microsoft/DeepSpeedExamples/issues/448

For in-depth analysis， I saw the definition of module for gptneox in the following file: https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/module_inject/containers/gptneox.py

See the following calling relationship： DeepSpeedGPTInference -> DeepSpeedTransformerInference -> DeepSpeedSelfAttention

The implementation of DeepSpeedSelfAttention is seemed to be inconsistent with that of huggingface gptneox (https://github.com/huggingface/transformers/blob/main/src/transformers/models/gpt_neox/modeling_gpt_neox.py):

for example, the implementation of GPTNeoXAttention in huggingface, it includes RotaryEmbedding. but DeepSpeedSelfAttention seemed to have nothing with RotaryEmbedding. I also found that other model like DS_GPTNEO, DS_BERT are also using the implementation of DeepSpeedSelfAttention.

so far, it can run for facebook/opt-1.3b with --enable_hybrid_engine successfully. but failed for GPT-NeoXT-Chat-Base-20B.

Can you help me to find out the problem. Thanks.

May 04 '23 09:05 GxjGit

DeepSpeed DeepSpeed copied to clipboard

[BUG] Error happened when running step3_rlhf_finetuning in enable_hybrid_engine mode with togethercomputer/GPT-NeoXT-Chat-Base-20B

DeepSpeed
DeepSpeed copied to clipboard