openvino
openvino copied to clipboard
[GPU][TRANSFORMATIONS] RoPE fusion for GLM-4-9B Hugging Face
Details:
- Enabled RoPE op fusion for GLM-4-9B Hugging Face model on GPU to improve the performance
- New config for RoPE is added to distinguish between mode that uses a precomputed RoPE cache for cos/sin in the existing GLM models and HF model that computes cos/sin at runtime. It's activated when
use_rope_cache=true
Tickets:
- 167665
LGTM from the perspective of GPU plugin side. Also please update the perf impact once evaluation is done.
@itikhono @CuriousPanCake Could you review this PR?