Qwen1.5 72b-text-v1.5-q6_K 和 72b-text-v1.5-fp16 哪一个性能更强？

72b-text-v1.5-q6_K 和 72b-text-v1.5-fp16 哪一个性能更强？

Open stan1233 opened this issue 4 months ago • 2 comments

想请教一下，QWEN的72b-text-v1.5-q6_K 和 72b-text-v1.5-fp16 哪一个性能更强？模型tag中的"_k"和"-q"分别代表什么，例如：

72b-chat-v1.5-q3_K_L

72b-chat-v1.5-q3_K_L

Apr 09 '24 12:04 stan1233

it is about the quantization, q6 you can regard it as 6 bit quantization, q2 you can regard it as 2 bit quantization. for sure, fp16 / bf16 should perform the best. check maxime's article and learn more about it. https://towardsdatascience.com/quantize-llama-models-with-ggml-and-llama-cpp-3612dfbcc172

Apr 09 '24 15:04 JustinLin610

it is about the quantization, q6 you can regard it as 6 bit quantization, q2 you can regard it as 2 bit quantization. for sure, fp16 / bf16 should perform the best. check maxime's article and learn more about it. https://towardsdatascience.com/quantize-llama-models-with-ggml-and-llama-cpp-3612dfbcc172这是关于量化的，q6你可以将其视为6位量化，q2你可以将其视为2位量化。当然，fp16 / bf16 应该表现最好。查看 maxime 的文章并了解更多信息。 https://towardsdatascience.com/quantize-llama-models-with-ggml-and-llama-cpp-3612dfbcc172

Thank you for your response, I've learned a lot.

Apr 10 '24 01:04 stan1233

Qwen1.5 Qwen1.5 copied to clipboard

72b-text-v1.5-q6_K 和 72b-text-v1.5-fp16 哪一个性能更强？

Qwen1.5
Qwen1.5 copied to clipboard