gpt-fast
gpt-fast copied to clipboard
Add mixtral support
Some preliminary perf numbers:
TP=8, fp16, 163.69 tok/s
@Chillee Hi there, did you get the fp8 version somewhere or you are currently working on the fp8 quant in this PR?