Sergey Shlyapnikov
Sergey Shlyapnikov
### Details: This PR aligns Interpolate primitive with nGraph's parameters and fixes pads order mismatch ### Tickets: - 88398
### Details: This PR aligns Reduce primitive with nGraph's parameters ### Tickets: - 88398
### Details: - Add initial SDPA implementation Remaining tasks: 1) Input/Output Transpose fusion support - https://github.com/openvinotoolkit/openvino/pull/24475 2) Indirect inputs support 3) GQA related optimization (Broadcast fusion)
### Details: - Add draft support of Transpose input fusions for SDPA op
### Details: This PR cherry-picks multiple RoPE PRs from master branch to OV 2024.2 version: - https://github.com/openvinotoolkit/openvino/pull/24615 - https://github.com/openvinotoolkit/openvino/pull/24750 - https://github.com/openvinotoolkit/openvino/pull/24756 - https://github.com/openvinotoolkit/openvino/pull/24829 - https://github.com/openvinotoolkit/openvino/pull/25200
### Details: This patch adds Validate pass call after IncreasePositionIdsPrecision to ensure proper data type propagation With this change the accuracy of llama-3-8b INT8 (and other LLMs probably) can be...
These changes add GPU device support for OpenVINO vLLM backend - Added `VLLM_OPENVINO_DEVICE` environment variable for OpenVINO device selection - Updated GPU-related components in OpenVINO backend (KV cache shapes, swap...
### Details: This PR enables KV-cache compression support Currently, it supports only combinations of the following configurations: * Data types: INT8_SYM / INT8_ASYM * Modes: per-token (quantization of `num_heads *...