optimum-habana
optimum-habana copied to clipboard
Fix RoPE data type issue for gpt_neox and stablelm (#177)
Internal implementation for RoPE changed so that if one of parameters data type is FP32, the op will be performed in FP32 data type. To force the op to bf16, need to convert all parameters data type to bf16 before RoPE is called.
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.
@mandy-li I believe the RoPE implementation changed because of https://habana.atlassian.net/servicedesk/customer/portal/1/HS-1574. Can you confirm that you reach a perplexity of about 8 or 9 with this PR when you run
GAUDI2_CI=1 RUN_SLOW=1 pytest tests/test_examples.py -v -s -k "neox"
with 1.16?
@regisss , good point. i didn't know this jira when i worked on the type casting. The reason why we have this PR is because our QA reported perf regression for StableLM and gpt-neox inference. So how about we only explicitly cast sin/cos to bf16 in the inference and leave training to use default behavior (i.e RoPE in fp32)?
@regisss , good point. i didn't know this jira when i worked on the type casting. The reason why we have this PR is because our QA reported perf regression for StableLM and gpt-neox inference. So how about we only explicitly cast sin/cos to bf16 in the inference and leave training to use default behavior (i.e RoPE in fp32)?
If that produces good results at inference then yes let's do that
created https://github.com/huggingface/optimum-habana/pull/999, close this one