blog Could you evaluate the performance of llama 3.1 across different quantized versions with various parameters?

Could you evaluate the performance of llama 3.1 across different quantized versions with various parameters?

Open iwaitu opened this issue 1 year ago • 0 comments

405B FP8 vs 405B int4 vs 70B vs 70B fp8 vs 70B int4

I think many people are interested in knowing what the most cost-effective deployment solution is.

Jul 28 '24 00:07 iwaitu