openvino Port SDPA to PagedAttention transformation

Details:

Ported SDPA to PagedAttention transformation from python to C++ code.

the related PRs: https://github.com/openvinotoolkit/openvino/pull/24127 https://github.com/openvinotoolkit/openvino/pull/24177

Tested model scope:

[x] "hf-internal-testing/tiny-random-BloomForCausalLM",
[x] "hf-internal-testing/tiny-random-FalconForCausalLM",
[x] "hf-internal-testing/tiny-random-Starcoder2ForCausalLM",
[x] "hf-internal-testing/tiny-random-GPTJForCausalLM",
[x] "hf-internal-testing/tiny-random-StableLmForCausalLM",
[x] "hf-internal-testing/tiny-random-LlamaForCausalLM",
[x] "hf-internal-testing/tiny-random-MistralForCausalLM",
[x] "hf-internal-testing/tiny-random-OPTForCausalLM",
[x] "hf-internal-testing/tiny-random-PhiForCausalLM",
[x] "hf-internal-testing/tiny-random-StableLmForCausalLM",
[x] "facebook/opt-125m",
[x] "llama2",
[x] "bigcode/starcoder2-7b"
[ ] "mosaicml/mpt-7b-chat" (FAILED both py/c++) - acceptable for this PR Issue: RuntimeError: Check '(axis_range_min <= axis) && (axis <= axis_range_max)' failed at src/core/src/validation_util.cpp:386: Concat Parameter axis 2 out of the tensor rank range [0, 0].
[x] means, that the response to the dedicated prompt is the same for the py and c++ transformations.

Tickets:

CVS-138664

May 01 '24 20:05 itikhono

@ilya-lavrenov @slyalin please take a look

May 02 '24 08:05 itikhono

@itikhono, have you compared IRs produced by Python and C++ paths for all models from the list?

May 02 '24 08:05 slyalin

ave you compared IRs produced by Python and C++ paths for all models from the list?

We agreed to run this model list: "hf-internal-testing/tiny-random-BloomForCausalLM", "hf-internal-testing/tiny-random-FalconForCausalLM", "hf-internal-testing/tiny-random-Starcoder2ForCausalLM", "hf-internal-testing/tiny-random-GPTJForCausalLM", "hf-internal-testing/tiny-random-StableLmForCausalLM", "hf-internal-testing/tiny-random-LlamaForCausalLM", "hf-internal-testing/tiny-random-MistralForCausalLM", "hf-internal-testing/tiny-random-OPTForCausalLM", "hf-internal-testing/tiny-random-PhiForCausalLM", "hf-internal-testing/tiny-random-StableLmForCausalLM", "facebook/opt-125m", "llama2", "bigcode/starcoder2-7b"

And compare the response generated for the dedicated prompt. No diffs between responses when using py and c++ impls were found.

Comparing IRs is another task We can do it but it will require more time

May 02 '24 08:05 itikhono

As I can see, Jenkins and ARM jobs fail in other PRs and in the post-commit . Not related to these changes.

May 02 '24 08:05 itikhono

openvino openvino copied to clipboard

Port SDPA to PagedAttention transformation

Details:

Tickets:

openvino
openvino copied to clipboard