verlocks
Results
1
comments of
verlocks
Hi @Guangxuan-Xiao, thanks for your explanation! However, it seems you didn't mention SWA in Mistral model? In Mistral model, it utilized [Sliding Window Attention](https://github.com/mistralai/mistral-src#sliding-window-to-speed-up-inference-and-reduce-memory-pressure) when inferencing and I believe it...