openvino
openvino copied to clipboard
OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
### Details: - Clean oneDNN primitive cache and the end of infer ### Tickets: - [CVS-171194](https://jira.devtools.intel.com/browse/CVS-171194)
### Documentation link https://community.intel.com/t5/Intel-Tiber-Developer-Cloud/Intel-LLM-Fine-Tuning-with-Hugging-Face/m-p/1611053/emcs_t/S2h8ZW1haWx8dG9waWNfc3Vic2NyaXB0aW9ufExZMkZRTTM0R0JFNkFSfDE2MTEwNTN8U1VCU0NSSVBUSU9OU3xoSw#M943 ### Description Summary This pull request introduces documentation and system-level guidance relevant to a kernel-level performance fix that significantly improves inference throughput and memory efficiency in...
### Details: - *Refactor how delegate work with initializers* ### Tickets: - *ticket-id*
### Details: - Make OCL context as singleton - GenAI tries to change its behavior to multi ov::Core, while buffers can be shared across ov::Core - We had two choices...
### Details: - *Current RoPEFusionFlux only supports `[batch, head_num, seq_length, head_size]`, extend it to support `[batch, seq_length, head_num, head_size]`* - *Extend CPU kernels to support FLux-style Rope fusion* ### Tickets:...
### Context As in the main issue. ### What needs to be done? As in the main issue. ### Example Pull Requests _No response_ ### Resources - [Contribution guide -...