vllm [Feature]: Custom attention masks

[Feature]: Custom attention masks

Open ojus1 opened this issue 8 months ago • 3 comments

Inspired from this paper, we're exploring ways to bootstrap a bidirectional-context LLM from a decoder-only Causal LLM (e.g. llama-3). This is very easy to do in huggingface transformers by passing a custom attention mask.

Looking for guidance on how to make this happen in vLLM? TLDR;

Compute bidirectional hidden states from prompt.
Use causal attention for decoding. Help appreciated!

Jun 03 '24 19:06 ojus1

vllm vllm copied to clipboard

[Feature]: Custom attention masks

vllm
vllm copied to clipboard