Jiyuan Qian

Results 22 comments of Jiyuan Qian

> Wait for this to land: #438 so you can use a better latency kernel (GPTQ) Hi @Narsil this is really exciting! do you have any early numbers to share...

I see. Previously I tried quantization on falcon-7b, and got 58ms per token with bitsandbytes, while without quantization it was 31ms per token. If GPTQ can be as fast as...