FastChat Does vicuna-13b/vicuna-7b provides flash-attention implementation during inference using this repo. If yes, where is the implementation?

Does vicuna-13b/vicuna-7b provides flash-attention implementation during inference using this repo. If yes, where is the implementation?

Open Tacacs-1101 opened this issue 2 years ago • 0 comments

Not an issue exactly. But couldn't figure. In training scrip, I believe it replaces all the multi-head attn with flash attn but not sure about what's happening in inference. Any help or explanation is highly appreciated.

Jun 05 '23 17:06 Tacacs-1101

FastChat FastChat copied to clipboard

Does vicuna-13b/vicuna-7b provides flash-attention implementation during inference using this repo. If yes, where is the implementation?

FastChat
FastChat copied to clipboard