optimum-habana icon indicating copy to clipboard operation
optimum-habana copied to clipboard

Fix RoPE data type issue for gpt_neox and stablelm (#177)

Open mandy-li opened this issue 9 months ago • 2 comments

Internal implementation for RoPE changed so that if one of parameters data type is FP32, the op will be performed in FP32 data type. To force the op to bf16, need to convert all parameters data type to bf16 before RoPE is called.

mandy-li avatar May 02 '24 21:05 mandy-li

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@mandy-li I believe the RoPE implementation changed because of https://habana.atlassian.net/servicedesk/customer/portal/1/HS-1574. Can you confirm that you reach a perplexity of about 8 or 9 with this PR when you run

GAUDI2_CI=1 RUN_SLOW=1 pytest tests/test_examples.py -v -s -k "neox"

with 1.16?

regisss avatar May 04 '24 07:05 regisss

@regisss , good point. i didn't know this jira when i worked on the type casting. The reason why we have this PR is because our QA reported perf regression for StableLM and gpt-neox inference. So how about we only explicitly cast sin/cos to bf16 in the inference and leave training to use default behavior (i.e RoPE in fp32)?

mandy-li avatar May 06 '24 17:05 mandy-li

@regisss , good point. i didn't know this jira when i worked on the type casting. The reason why we have this PR is because our QA reported perf regression for StableLM and gpt-neox inference. So how about we only explicitly cast sin/cos to bf16 in the inference and leave training to use default behavior (i.e RoPE in fp32)?

If that produces good results at inference then yes let's do that

regisss avatar May 10 '24 09:05 regisss

created https://github.com/huggingface/optimum-habana/pull/999, close this one

mandy-li avatar May 22 '24 16:05 mandy-li