Thireus ☠ comments

Results 57 comments of


                                            Thireus ☠

Check if ffn_up and ffn_gate are of the same type before using fmoe

@ubergarm - Thanks, I went with option 1. (Get the imatrix from unsloth and produce my own quants for ik_llama). I've adapted the quants they use in their model to...

Check if ffn_up and ffn_gate are of the same type before using fmoe

Early observations using PPL: Using unsloth's imatrix into IQ1_S quants leads to slightly degraded results. `PPL = 4.9200 +/- 0.02917` Unless I'm missing something, there are no mind-blowing results when...

Check if ffn_up and ffn_gate are of the same type before using fmoe

I need some help to understand quant performance - how can I know which quant performs better than others? Are there metrics somewhere that I've missed? For example, when using...

Check if ffn_up and ffn_gate are of the same type before using fmoe

Thank you for the tips! > I pick a small model of similar architecture and make a bunch of quants. Then test them with llama-sweep-bench to empirically discover which ones...

Check if ffn_up and ffn_gate are of the same type before using fmoe

Thank you @ubergarm and @ikawrakow - I'll switch to DeepSeek-V2-Lite so it can be a better representation of R1-0528 The measurements I took were with partial offloading and latest ik_llama...

Check if ffn_up and ffn_gate are of the same type before using fmoe

Thank you for all the feedback. I am making small progress and I'm working towards a combination of quants that brings high speed (both prompt eval and new tokens) as...

Check if ffn_up and ffn_gate are of the same type before using fmoe

Just wanted to share that I haven't given up, in fact I have made my first breakthrough today after a week of bruteforcing and auto-analysis to find the optimum quant...