openvino
openvino copied to clipboard
Port SDPA to PagedAttention transformation
Details:
Ported SDPA to PagedAttention transformation from python to C++ code.
the related PRs: https://github.com/openvinotoolkit/openvino/pull/24127 https://github.com/openvinotoolkit/openvino/pull/24177
Tested model scope:
- [x] "hf-internal-testing/tiny-random-BloomForCausalLM",
- [x] "hf-internal-testing/tiny-random-FalconForCausalLM",
- [x] "hf-internal-testing/tiny-random-Starcoder2ForCausalLM",
- [x] "hf-internal-testing/tiny-random-GPTJForCausalLM",
- [x] "hf-internal-testing/tiny-random-StableLmForCausalLM",
- [x] "hf-internal-testing/tiny-random-LlamaForCausalLM",
- [x] "hf-internal-testing/tiny-random-MistralForCausalLM",
- [x] "hf-internal-testing/tiny-random-OPTForCausalLM",
- [x] "hf-internal-testing/tiny-random-PhiForCausalLM",
- [x] "hf-internal-testing/tiny-random-StableLmForCausalLM",
- [x] "facebook/opt-125m",
- [x] "llama2",
- [x] "bigcode/starcoder2-7b"
- [ ] "mosaicml/mpt-7b-chat" (FAILED both py/c++) - acceptable for this PR Issue: RuntimeError: Check '(axis_range_min <= axis) && (axis <= axis_range_max)' failed at src/core/src/validation_util.cpp:386: Concat Parameter axis 2 out of the tensor rank range [0, 0].
- [x] means, that the response to the dedicated prompt is the same for the py and c++ transformations.
Tickets:
- CVS-138664
@ilya-lavrenov @slyalin please take a look
@itikhono, have you compared IRs produced by Python and C++ paths for all models from the list?
ave you compared IRs produced by Python and C++ paths for all models from the list?
We agreed to run this model list: "hf-internal-testing/tiny-random-BloomForCausalLM", "hf-internal-testing/tiny-random-FalconForCausalLM", "hf-internal-testing/tiny-random-Starcoder2ForCausalLM", "hf-internal-testing/tiny-random-GPTJForCausalLM", "hf-internal-testing/tiny-random-StableLmForCausalLM", "hf-internal-testing/tiny-random-LlamaForCausalLM", "hf-internal-testing/tiny-random-MistralForCausalLM", "hf-internal-testing/tiny-random-OPTForCausalLM", "hf-internal-testing/tiny-random-PhiForCausalLM", "hf-internal-testing/tiny-random-StableLmForCausalLM", "facebook/opt-125m", "llama2", "bigcode/starcoder2-7b"
And compare the response generated for the dedicated prompt. No diffs between responses when using py and c++ impls were found.
Comparing IRs is another task We can do it but it will require more time
As I can see, Jenkins and ARM jobs fail in other PRs and in the post-commit . Not related to these changes.