Bamboo Non-Relu LLM inference sparse activation speedup

Non-Relu LLM inference sparse activation speedup

Open pkumc opened this issue 2 months ago • 2 comments

@YixinSong-e @ZeyuMi Very excellent work! By the way, have you compared the inference speedup of Non-Relu LLM, such as original mistral-7B/llama-7B? If non-Relu LLM is also sparse in a degree(Figure 1 in ReLU2 Wins paper), maybe we can directly accelerate inference by sparse activation without extra expensive relufication step. So I'm not sure if you have done some non-Relu LLM experiments? If the result is not positive, which reason could be the key factor?

Non-Relu LLM is not sparse enough to get a significant speedup.
It's harder to predict the activation sparsity for non-Relu LLM. Or some others? Thank you!

Apr 07 '24 12:04 pkumc

Bamboo Bamboo copied to clipboard

Non-Relu LLM inference sparse activation speedup

Bamboo
Bamboo copied to clipboard