Qwen1.5 icon indicating copy to clipboard operation
Qwen1.5 copied to clipboard

72b-text-v1.5-q6_K 和 72b-text-v1.5-fp16 哪一个性能更强?

Open stan1233 opened this issue 4 months ago • 2 comments

想请教一下,QWEN的72b-text-v1.5-q6_K 和 72b-text-v1.5-fp16 哪一个性能更强? 模型tag中的"_k"和"-q"分别代表什么,例如:

72b-chat-v1.5-q3_K_L

72b-chat-v1.5-q3_K_L

stan1233 avatar Apr 09 '24 12:04 stan1233

it is about the quantization, q6 you can regard it as 6 bit quantization, q2 you can regard it as 2 bit quantization. for sure, fp16 / bf16 should perform the best. check maxime's article and learn more about it. https://towardsdatascience.com/quantize-llama-models-with-ggml-and-llama-cpp-3612dfbcc172

JustinLin610 avatar Apr 09 '24 15:04 JustinLin610

it is about the quantization, q6 you can regard it as 6 bit quantization, q2 you can regard it as 2 bit quantization. for sure, fp16 / bf16 should perform the best. check maxime's article and learn more about it. https://towardsdatascience.com/quantize-llama-models-with-ggml-and-llama-cpp-3612dfbcc172这是关于量化的,q6你可以将其视为6位量化,q2你可以将其视为2位量化。当然,fp16 / bf16 应该表现最好。查看 maxime 的文章并了解更多信息。 https://towardsdatascience.com/quantize-llama-models-with-ggml-and-llama-cpp-3612dfbcc172

Thank you for your response, I've learned a lot.

stan1233 avatar Apr 10 '24 01:04 stan1233