Kawrakow

Results 90 comments of Kawrakow

@ggerganov Are these results with or without the changes you made to `Q4_3` after I opened this PR (and reported the results)?

@ggerganov Rebased this branch on latest master, re-quantized, re-ran the perplexity. Now I get the lower result as well with OPEN_BLAS (`5.2961`, so actually `0.0001` lower than cuBLAS). So, something...

@ggerganov I propose we close this PR. Although there is some benefit from rmse minimization for `QX_1` and `QX_3` quantization of the 7B model, the benefit mostly goes away for...

> @ikawrakow Any idea what could cause this? Have you done any tests so far in regards to imatrix and IQ quants for Llama 3? @Dampfinchen I have moved on...

@sw Thank you for the measurement. Yes, of course, I can move to `examples`. My thinking was that this is a POC, so it is better to have a folder...

@CISC I'm unable to test this model. I cloned the model from `[email protected]:gorilla-llm/gorilla-openfunctions-v2`. My attempt to convert with the `convert.py` script was greeted with this message: ``` Traceback (most recent...

@CISC * Downloaded the `fp16` GGUF from the link you provided * Ran `./bin/imatrix -m ../models/gorilla/ggml-model-f16.gguf -t 1 -ngl 100 -f ../tests/wiki.train.raw --chunks 100 -o gorilla_imatrix.dat` * Ran `./bin/quantize --imatrix...

I'm getting gibberish with your imatrix too (`5023tessa-147890tessa-147890tessa-147890` is the response in my case). I also get gibberish from another WikiText2 imatrix that uses 1000 chunks. `IQ1_S` is not really...

@CISC I specifically made the code to be the way it is because it does give a lower PPL for the 9 models I'm testing. I'm traveling for a few...

@CISC Here is a table that compares PPL between master and your proposed changes. To not complicate things, values are computed with the default `rms_norm_epsilon`. | Model | PPL (master)...