FastChat
FastChat copied to clipboard
Does vicuna-13b/vicuna-7b provides flash-attention implementation during inference using this repo. If yes, where is the implementation?
Not an issue exactly. But couldn't figure. In training scrip, I believe it replaces all the multi-head attn with flash attn but not sure about what's happening in inference. Any help or explanation is highly appreciated.