Liangfu Chen comments

Results 27 comments of


                                            Liangfu Chen

[RFC] Initial Support for AWS Inferentia

Thanks @miladm for bringing torch-xla into discussion. >Does this implementation work on neuron backend or any torch_xla backend? I was trying to reduce torch-xla code, since tracing two graphs for...

[RFC] Initial Support for AWS Inferentia

The initial support has been merged. Closing this issue.

[RFC] Initial Support for AWS Inferentia

>it seems initial support only allows for a max input sequence length of 128 tokens because it has to match block size - is my understanding correct? The plan is...

disparity to depth calculations

Because disparity and distance from the cameras are inversely related, the distance ground-truth is generated from the disparity map by computing D_gt = b * f / d where D...

Neuron cache blocks must be 1 more than max num seqs

I think the motivation for the proposed change is that in scheduler, 1/ we pad with `0` in block_tables, and 2/ **recompute** when we run out of KV cache blocks....

[Neuron] Introduce paged attention support for neuron backend

Closing this PR, since we are prioritizing vLLM V1 support for neuron backend.

[RFC] Initial Support for Cloud TPUs

Thanks for the proposal @WoosukKwon . I'm interested to learn a few more details: 1/ What is the proposed KV cache layout ? 2/ How are we going to use...