Cody Yu comments

Results 161 comments of


                                            Cody Yu

[Kernel] Add prefix-caching support for phi-3-small-8k/128k model triton kernel

> @comaniac, would you mind find someone to review this PR? This is an important model. If reviewer feels reviewing kernel takes too long, maybe requesting more numerical tests and...

[Bugfix] Handle `best_of>1` by disabling multi-step scheduling; fail if beam search is invoked with multi-step scheduling

I understand the approach but get a bit confused about the code changes. Specifically how the "disabling multi-step when n>1" handled in this PR? Seems like this PR only raises...

[V1] Prompt logprobs + APC compatibility; prompt logprobs reqs cannot fill APC

Just confirm: Is this ready for review?

[V1] Prompt logprobs + APC compatibility; prompt logprobs reqs cannot fill APC

sg. Please let me know when it's ready for review and I'll prioritize it.

Will AOT compilation still be supported after JIT compilation is added?

Sounds good to me. It would be even more better if we could allow `flashinfer` to install `flashinfer_aot`; otherwise most users would probably suffer from long compile time (and require...

[MISC] Upgrade dependency to PyTorch 2.3.1

Thanks for pointing out. I just changed all required places. Meanwhile, yeah we do need xFormers and vllm-flash-attn...

[MISC] Upgrade dependency to PyTorch 2.3.1

Pending xformers to release a version against torch 2.3.1. Tracking issue https://github.com/facebookresearch/xformers/issues/1052 Closing #4509

[MISC] Upgrade dependency to PyTorch 2.3.1

CI passed, but need to double check manually whether FlashInfer supports torch 2.3.1.

[MISC] Upgrade dependency to PyTorch 2.3.1

- Checked that FlashInfer works with PyTorch 2.3.1. - Performance benchmark (https://buildkite.com/vllm/performance-benchmark/builds/4569#) shows no regression compared to https://simon-mo-workspace.observablehq.cloud/vllm-dashboard-v0/perf This PR should be good to go. cc @Yard1 @robertgshaw2-neuralmagic @WoosukKwon @simon-mo...

[V1][PP] Fix & Pin Ray version in requirements-cuda.txt

cc @ruisearch42 @richardliaw