vllm icon indicating copy to clipboard operation
vllm copied to clipboard

[Feature]: Custom attention masks

Open ojus1 opened this issue 8 months ago • 3 comments

Inspired from this paper, we're exploring ways to bootstrap a bidirectional-context LLM from a decoder-only Causal LLM (e.g. llama-3). This is very easy to do in huggingface transformers by passing a custom attention mask.

Looking for guidance on how to make this happen in vLLM? TLDR;

  1. Compute bidirectional hidden states from prompt.
  2. Use causal attention for decoding. Help appreciated!

ojus1 avatar Jun 03 '24 19:06 ojus1