Liangfu Chen
Liangfu Chen
Thanks @miladm for bringing torch-xla into discussion. >Does this implementation work on neuron backend or any torch_xla backend? I was trying to reduce torch-xla code, since tracing two graphs for...
The initial support has been merged. Closing this issue.
>it seems initial support only allows for a max input sequence length of 128 tokens because it has to match block size - is my understanding correct? The plan is...
Because disparity and distance from the cameras are inversely related, the distance ground-truth is generated from the disparity map by computing D_gt = b * f / d where D...
I think the motivation for the proposed change is that in scheduler, 1/ we pad with `0` in block_tables, and 2/ **recompute** when we run out of KV cache blocks....
Closing this PR, since we are prioritizing vLLM V1 support for neuron backend.
Thanks for the proposal @WoosukKwon . I'm interested to learn a few more details: 1/ What is the proposed KV cache layout ? 2/ How are we going to use...