optimum-habana Fix RoPE data type issue for gpt

Fix RoPE data type issue for gpt_neox and stablelm (#177)

Open mandy-li opened this issue 9 months ago • 2 comments

Internal implementation for RoPE changed so that if one of parameters data type is FP32, the op will be performed in FP32 data type. To force the op to bf16, need to convert all parameters data type to bf16 before RoPE is called.

May 02 '24 21:05 mandy-li

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

May 02 '24 21:05 HuggingFaceDocBuilderDev

@mandy-li I believe the RoPE implementation changed because of https://habana.atlassian.net/servicedesk/customer/portal/1/HS-1574. Can you confirm that you reach a perplexity of about 8 or 9 with this PR when you run

GAUDI2_CI=1 RUN_SLOW=1 pytest tests/test_examples.py -v -s -k "neox"

with 1.16?

May 04 '24 07:05 regisss

@regisss , good point. i didn't know this jira when i worked on the type casting. The reason why we have this PR is because our QA reported perf regression for StableLM and gpt-neox inference. So how about we only explicitly cast sin/cos to bf16 in the inference and leave training to use default behavior (i.e RoPE in fp32)?

May 06 '24 17:05 mandy-li

@regisss , good point. i didn't know this jira when i worked on the type casting. The reason why we have this PR is because our QA reported perf regression for StableLM and gpt-neox inference. So how about we only explicitly cast sin/cos to bf16 in the inference and leave training to use default behavior (i.e RoPE in fp32)?

If that produces good results at inference then yes let's do that

May 10 '24 09:05 regisss

created https://github.com/huggingface/optimum-habana/pull/999, close this one

May 22 '24 16:05 mandy-li

optimum-habana optimum-habana copied to clipboard

Fix RoPE data type issue for gpt_neox and stablelm (#177)

optimum-habana
optimum-habana copied to clipboard