lmdeploy
lmdeploy copied to clipboard
Optimize w8a8 kernel
Kernel would not recompile when M changes.
The performance on small batch size and short context is still not fast enough. Since triton kernel launch takes too much time.