Liangfu Chen

Results 27 comments of Liangfu Chen

Thanks @miladm for bringing torch-xla into discussion. >Does this implementation work on neuron backend or any torch_xla backend? I was trying to reduce torch-xla code, since tracing two graphs for...

The initial support has been merged. Closing this issue.

>it seems initial support only allows for a max input sequence length of 128 tokens because it has to match block size - is my understanding correct? The plan is...

Because disparity and distance from the cameras are inversely related, the distance ground-truth is generated from the disparity map by computing D_gt = b * f / d where D...

I think the motivation for the proposed change is that in scheduler, 1/ we pad with `0` in block_tables, and 2/ **recompute** when we run out of KV cache blocks....

Closing this PR, since we are prioritizing vLLM V1 support for neuron backend.

Thanks for the proposal @WoosukKwon . I'm interested to learn a few more details: 1/ What is the proposed KV cache layout ? 2/ How are we going to use...