openvino
openvino copied to clipboard
[GPU] KV-cache compression support
Details:
This PR enables KV-cache compression support Currently, it supports only combinations of the following configurations:
- Data types: INT8_SYM / INT8_ASYM
- Modes: per-token (quantization of
num_heads * head_sizein a single group) / per-token-per-head (quantization of eachhead_sizegroup for each head per token)
Tickets:
- ticket-id