felixzhu555

Results 1 issues of felixzhu555

### Overview This PR adds experimental support for attention sinks (#1304), based on this [paper](https://arxiv.org/abs/2309.17453) and [repo](https://github.com/mit-han-lab/streaming-llm). Support is currently limited to RoPE and ALiBi models (e.g. Llama, Mistral/Mixtral, Falcon,...