Sergey Shlyapnikov

Results 9 comments of Sergey Shlyapnikov

Hi @liuxingbin Can you share how you are running VLLM? Did you try setting a lower max_model_length value? We assume there should be enough GPU memory to run max_model_length tokens...

Hi @awayzjj , thank you for checking the issue! Let me add more details. The issue is connected with an incorrect performance profiling report for the IF operation. It is...

Hi @JulienMaille, Could you please share the installed GPU driver version? Also, could you please check if the issue can be reproduced using [benchmark_app](https://docs.openvino.ai/nightly/get-started/learn-openvino/openvino-samples/benchmark-tool.html#examples-of-running-the-tool) tool?

By the way, the current version implements dynamism through kernel recompilation for each new dynamic shape configuration. However, we could support a shape_agnostic kernel version that can be compiled once...

@xipingyan , can you please check CI test failures? ``` ov_gpu_func_tests-0 INFO: FAILED TESTS (1/39269): ov_gpu_func_tests-0 INFO: 2909 ms: ov_gpu_func_tests smoke_CustomOpDynamic.Accuracy ```

@AKochin , @dmitry-gorokhov, could you please review the changes from Transformations and CPU sides?

Hi @WoosukKwon, could you please take a look at these changes?

@mgoin, thank you for your comments! I [applied them](https://github.com/vllm-project/vllm/pull/8192/commits/1723d77e7352d7138b14d1427cc16f1987ef5761) and rebased the branch on top of the recent main, please take a look

@Kotomi-Du, how about the following implementation? 1) Keep the existing order of allocations and memory reuse for the sum post-op 2) Move the logic related to onednn impls node memory...