Bamboo
Bamboo copied to clipboard
Non-Relu LLM inference sparse activation speedup
@YixinSong-e @ZeyuMi Very excellent work! By the way, have you compared the inference speedup of Non-Relu LLM, such as original mistral-7B/llama-7B? If non-Relu LLM is also sparse in a degree(Figure 1 in ReLU2 Wins paper), maybe we can directly accelerate inference by sparse activation without extra expensive relufication step. So I'm not sure if you have done some non-Relu LLM experiments? If the result is not positive, which reason could be the key factor?
- Non-Relu LLM is not sparse enough to get a significant speedup.
- It's harder to predict the activation sparsity for non-Relu LLM. Or some others? Thank you!