Thireus ☠
Thireus ☠
@ubergarm - Thanks, I went with option 1. (Get the imatrix from unsloth and produce my own quants for ik_llama). I've adapted the quants they use in their model to...
Early observations using PPL: Using unsloth's imatrix into IQ1_S quants leads to slightly degraded results. `PPL = 4.9200 +/- 0.02917` Unless I'm missing something, there are no mind-blowing results when...
I need some help to understand quant performance - how can I know which quant performs better than others? Are there metrics somewhere that I've missed? For example, when using...
Thank you for the tips! > I pick a small model of similar architecture and make a bunch of quants. Then test them with llama-sweep-bench to empirically discover which ones...
Thank you @ubergarm and @ikawrakow - I'll switch to DeepSeek-V2-Lite so it can be a better representation of R1-0528 The measurements I took were with partial offloading and latest ik_llama...
Thank you for all the feedback. I am making small progress and I'm working towards a combination of quants that brings high speed (both prompt eval and new tokens) as...
Just wanted to share that I haven't given up, in fact I have made my first breakthrough today after a week of bruteforcing and auto-analysis to find the optimum quant...