openvino icon indicating copy to clipboard operation
openvino copied to clipboard

[GPU] KV-cache compression support

Open sshlyapn opened this issue 1 year ago • 0 comments

Details:

This PR enables KV-cache compression support Currently, it supports only combinations of the following configurations:

  • Data types: INT8_SYM / INT8_ASYM
  • Modes: per-token (quantization of num_heads * head_size in a single group) / per-token-per-head (quantization of each head_size group for each head per token)

Tickets:

  • ticket-id

sshlyapn avatar Oct 18 '24 05:10 sshlyapn