blog
blog copied to clipboard
Could you evaluate the performance of llama 3.1 across different quantized versions with various parameters?
405B FP8 vs 405B int4 vs 70B vs 70B fp8 vs 70B int4
I think many people are interested in knowing what the most cost-effective deployment solution is.