openvino icon indicating copy to clipboard operation
openvino copied to clipboard

[GPU][TRANSFORMATIONS] RoPE fusion for GLM-4-9B Hugging Face

Open andrew-k-park opened this issue 10 months ago • 1 comments

Details:

  • Enabled RoPE op fusion for GLM-4-9B Hugging Face model on GPU to improve the performance
  • New config for RoPE is added to distinguish between mode that uses a precomputed RoPE cache for cos/sin in the existing GLM models and HF model that computes cos/sin at runtime. It's activated when use_rope_cache=true

Tickets:

  • 167665

andrew-k-park avatar Jun 18 '25 08:06 andrew-k-park

LGTM from the perspective of GPU plugin side. Also please update the perf impact once evaluation is done.

yeonbok avatar Jun 18 '25 16:06 yeonbok

@itikhono @CuriousPanCake Could you review this PR?

andrew-k-park avatar Jun 23 '25 07:06 andrew-k-park