openvino
openvino copied to clipboard
[NPUW] add GroupQueryAttention OP in npuw llm_compiled_model
follow current implementation for ScaledDotProductAttentionDecomposition, https://github.com/openvinotoolkit/openvino/blob/master/src/plugins/intel_npu/src/plugin/npuw/llm_compiled_model.cpp#L221
add GroupQueryAttention OP decomposition logic for NPUW llm_compiled_model.
Hi @smirnov-alexey , can you help to review, thanks!
@esmirno could you please review?
build_jenkins