openvino icon indicating copy to clipboard operation
openvino copied to clipboard

Port SDPA to PagedAttention transformation

Open itikhono opened this issue 1 year ago • 4 comments

Details:

Ported SDPA to PagedAttention transformation from python to C++ code.

the related PRs: https://github.com/openvinotoolkit/openvino/pull/24127 https://github.com/openvinotoolkit/openvino/pull/24177

Tested model scope:

  • [x] "hf-internal-testing/tiny-random-BloomForCausalLM",
  • [x] "hf-internal-testing/tiny-random-FalconForCausalLM",
  • [x] "hf-internal-testing/tiny-random-Starcoder2ForCausalLM",
  • [x] "hf-internal-testing/tiny-random-GPTJForCausalLM",
  • [x] "hf-internal-testing/tiny-random-StableLmForCausalLM",
  • [x] "hf-internal-testing/tiny-random-LlamaForCausalLM",
  • [x] "hf-internal-testing/tiny-random-MistralForCausalLM",
  • [x] "hf-internal-testing/tiny-random-OPTForCausalLM",
  • [x] "hf-internal-testing/tiny-random-PhiForCausalLM",
  • [x] "hf-internal-testing/tiny-random-StableLmForCausalLM",
  • [x] "facebook/opt-125m",
  • [x] "llama2",
  • [x] "bigcode/starcoder2-7b"
  • [ ] "mosaicml/mpt-7b-chat" (FAILED both py/c++) - acceptable for this PR Issue: RuntimeError: Check '(axis_range_min <= axis) && (axis <= axis_range_max)' failed at src/core/src/validation_util.cpp:386: Concat Parameter axis 2 out of the tensor rank range [0, 0].
  • [x] means, that the response to the dedicated prompt is the same for the py and c++ transformations.

Tickets:

  • CVS-138664

itikhono avatar May 01 '24 20:05 itikhono

@ilya-lavrenov @slyalin please take a look

itikhono avatar May 02 '24 08:05 itikhono

@itikhono, have you compared IRs produced by Python and C++ paths for all models from the list?

slyalin avatar May 02 '24 08:05 slyalin

ave you compared IRs produced by Python and C++ paths for all models from the list?

We agreed to run this model list: "hf-internal-testing/tiny-random-BloomForCausalLM", "hf-internal-testing/tiny-random-FalconForCausalLM", "hf-internal-testing/tiny-random-Starcoder2ForCausalLM", "hf-internal-testing/tiny-random-GPTJForCausalLM", "hf-internal-testing/tiny-random-StableLmForCausalLM", "hf-internal-testing/tiny-random-LlamaForCausalLM", "hf-internal-testing/tiny-random-MistralForCausalLM", "hf-internal-testing/tiny-random-OPTForCausalLM", "hf-internal-testing/tiny-random-PhiForCausalLM", "hf-internal-testing/tiny-random-StableLmForCausalLM", "facebook/opt-125m", "llama2", "bigcode/starcoder2-7b"

And compare the response generated for the dedicated prompt. No diffs between responses when using py and c++ impls were found.

Comparing IRs is another task We can do it but it will require more time

itikhono avatar May 02 '24 08:05 itikhono

As I can see, Jenkins and ARM jobs fail in other PRs and in the post-commit . Not related to these changes.

itikhono avatar May 02 '24 08:05 itikhono