afeldman-nm

Results 20 issues of afeldman-nm

FILL IN THE PR DESCRIPTION HERE FIX #xxxx (*link existing issues this PR will resolve*) **BEFORE SUBMITTING, PLEASE READ THE CHECKLIST BELOW AND FILL IN THE DESCRIPTION ABOVE** --- PR...

FILL IN THE PR DESCRIPTION HERE FIX #xxxx (*link existing issues this PR will resolve*) **BEFORE SUBMITTING, PLEASE READ THE CHECKLIST BELOW AND FILL IN THE DESCRIPTION ABOVE** --- PR...

vLLM currently supports decoder-only models. This PR - Adds support for T5 (encoder/decoder model) using the latest vLLM Attention wrapper - Augments SequenceGroups with optional encoder sequence representations - Modifies...

GOALS • Whisper support • Exemplifies encoder/decoder (E/D) support • E/D K/V caching • E/D parallelism TESTING • HuggingFace whisper model • Replicate public English Speech Recognition (SR) test using...

FILL IN THE PR DESCRIPTION HERE FIX #xxxx (*link existing issues this PR will resolve*) **BEFORE SUBMITTING, PLEASE READ THE CHECKLIST BELOW AND FILL IN THE DESCRIPTION ABOVE** --- PR...

This PR is a step towards encoder/decoder model support. This PR (1) allows a SequenceGroup to be associated with 0 or 1 encoder sequences, and (2) causes an encoder/decoder model...

This PR is a step towards encoder/decoder model support. This PR creates a specialized ModelRunner subclass for encoder/decoder models; it differs from the base ModelRunner class primarily in that it...

This PR makes QKV computation more efficient in the case of cross-attention layers. `QKVParallelLinear` only performs self-attention QKV computation against a single hidden-state input from the previous decoder layer. Cross-attention...

This issue is not in response to a performance regression. The method of performing cross-attention QKV computations introduced in #4942 could be improved. Because this issue relates to cross-attention, it...

misc

## Motivation # There is significant interest in vLLM supporting encoder/decoder models. Issues #187 and #180 , for example, request encoder/decoder model support. As a result encoder/decoder support was recently...

RFC