KIVI icon indicating copy to clipboard operation
KIVI copied to clipboard

Why the model inference slowly when Mistral-7B-Instruct-v0.2 apply the kivi?

Open lichongod opened this issue 9 months ago • 6 comments

截屏2024-05-13 11 33 08 As you can see, the top is the result with kivi 2bit applied, and the bottom is the 16bit result。 With kivi, token generation is reduced by a quarter

lichongod avatar May 13 '24 03:05 lichongod