openvino
openvino copied to clipboard
Refactor pagedAttention transpose
Details:
- Move transpose functions from executor_pa.cpp to transpose.hpp to reuse in xattention and executor_pa.cpp. Modify transpose_16NxK logic to handle tails
Tickets:
Test on EMR, no regression in performance and accuracy.