afeldman-nm comments

Results 53 comments of


                                            afeldman-nm

[Core] Cross-attention KV caching and memory-management (towards eventual encoder/decoder model support)

FYI to reviewer - my PR is failing the buildkite/ci/pr/amd-distributed-tests test, with what appears to be a HuggingFace issue: =========================== short test summary info ============================ FAILED distributed/test_chunked_prefill_distributed.py::test_models[16-5-half-meta-llama/Llama-2-7b-hf] - OSError: You...

Add Encoder-decoder model support and T5 Model support

Thanks @js8544 ! Taking a look

Add Encoder-decoder model support and T5 Model support

@js8544 please review this PR against your feature branch https://github.com/js8544/vllm/pull/1 it adds a t5 encoder/decoder example file, and also finishes merging upstream main into your PR.

Add Encoder-decoder model support and T5 Model support

FYI I think this PR has some conflicts with recent changes to the main branch. I am looking at resolving them. This PR was previously passing all of the tests...

Add Encoder-decoder model support and T5 Model support

Hello @Abineshik yes things are moving apace. Thanks for checking in. I determined it is probably for the best for encoder decoder models to have separate blocktables for self- and...

Add Encoder-decoder model support and T5 Model support

> @afeldman-nm how is the change you are working on going? Work is still ongoing but hope to finish soon!

Whisper support

@zhuohan123 I am working on Whisper support.

Whisper support

@dbogunowicz thanks for your work on Whisper! Since there is clearly interest in this feature and its completion timeline, I want to add the context that Whisper support takes a...

Whisper support

See the encoder/decoder support issue (https://github.com/vllm-project/vllm/issues/187) and new PR (https://github.com/vllm-project/vllm/pull/4289) for a status update on encoder/decoder support, which is a prereq for Whisper support.

Whisper support

> Hi, any update on serving faster-whisper via VLLM? Hi @twicer-is-coder , Whisper (or any variant thereof) is high of the list of models to add once infrastructure support is...