[OV GPU] Add the capability for KV cache to update past KV
Details:
This PR is to recognize the pattern of ScatterElementUpdate+Slice node(blue nodes in the picture below) and fuse them into multi-stages KVCache node. After fusion, two related changes happened.
- ScatteElementUpdate is handled by adding reorder_stage to execute ScatteElementUpdate kernel
- Slice is handled by in-place crop by updating the data padding of variableState.
The picture below shows the graph changes before and after fusion.
Motivation and Context
The Microsoft Phi-Silica application leverages tree-based speculative decoding to accelerate LLM inference. This technique requires frequent manipulation of past KV cache states (e.g. trimming, reordering). This is because only a single branch of the speculative draft tree is accepted after verification.
The current KV Cache API available is OV is very slow which cannot meet MSFT requirements. Details in CVS-174809. As OV team suggested, the only way to support reorder feature is to add specific nodes in the original graph. This PR is to recognize the pattern of added nodes and fuse them into multi-stages KVCache node to be more performant.
Tickets:
build_jenkins
build_jenkins
Please do not mention customer name in the description. I already updated it.