Li, Jiang comments

Results 14 comments of


                                            Li, Jiang

[Core][Model runner refactoring 1/N] Refactor attn metadata term

Seems ```seq_lens``` in ```torch_sdpa.py``` should be replaced with ```seqlens```. I have verified the CPU backend with the model test, the change worked well.

[Hardware][Intel] Add CPU inference backend

@WoosukKwon Sure, please refer to #3654

[Hardware][Intel] Add CPU inference backend

@WoosukKwon Agree, I think this might be a good direction to try. For these element-wise operations and normalization operations, using ```torch.compile``` would unify the front-end to Python code and use...

[Hardware][Intel] Add CPU inference backend

@WoosukKwon Thanks for your comments! I have fixed most of them. For ```CPUModelRunner```, yes, you are right, isolate it with ```ModelRunner``` will avoid potential code breaks completely. We can do...

[Hardware][Intel] Add CPU inference backend

Hi @WoosukKwon Thanks for your further comments. I have fixed them all, please check, thanks.

[Hardware][Intel] Add CPU inference backend

Hi @WoosukKwon Thanks for your efforts to review this large PR! I have added a CI script for the CPU, with building and offline inference. It was deployed on vLLM...

[Hardware][Intel] Add CPU inference backend

Hi @markluofd the online inference of the CPU backend is still under tunning, we will enable it when it is ready.

[Hardware][Intel] Add CPU inference backend

@markluofd Yes, the performance may have some regression. Because the CPU inference thread pool(OpenMP), HTTP service thread pool, and tokenizer threads will scramble CPU cores. We plan to isolate the...

[Hardware][Intel] Add CPU inference backend

@markluofd FP16 will be cast to BF16 right now. BF16 is always supported even if there is no avx512_bf16 ISA. Pure FP16 support will be added soon, might be at...

[Hardware][Intel] Add CPU inference backend

Hi @ProExpertProg It is feasible to load different backends dylib at runtime. vLLM has multple backends with different dependencies and configurations, so it might be a lot of works to...