Gary Linscott
Gary Linscott
Sure enough, my model was busted! Ok, I see consistent results now :). ``` params.prompt.size() = 1290589, tokens.size() = 332762, params.n_ctx = 512, seq_count = 649 perplexity: 16.0483 [16/649] 22507...
I'll do a run with: ``` $ ./main --perplexity -m models/7B/ggml-model-q4_0.bin -f wiki.test.raw ``` And log all the perplexities along the way. Once it starts to converge, it's probably a...
I asked GPT4 for some stats advice, and it recommended: > If you only care about accuracy down to two decimal digits, then you can stop sampling when your confidence...
@bakkot - wow, that is an incredible delta. Interesting, so the tokens must be subtly off with the existing tokenizer?
Ok, well, for the 7B model at 4 bit quantization, the perplexity appears to be 12.2-12.9 or so. Doing a little bit of a random walk. Going to stop at...
Results for 4-bit quantization are looking great so far with #252 merged in as well! ``` perplexity: 6.5217 [62/655] perplexity: 6.5569 [63/655] perplexity: 6.5744 [64/655] perplexity: 6.6235 [65/655] perplexity: 6.6335...
Well, good news, the 4 bit quantization looks pretty good (although definitely not matching f16 results)! I got this result last night from the 4-bit quantized weights, after merging in...
> Determining whether you can get away with fewer chunks will depend on the size of the effect you're looking at - e.g. the fixed tokenizer is obviously better after...
@Green-Sky cool, will run that. Okay, here are latest results (and btw, I had copied the wrong perplexity in my above comment - it should be `6.5949 [655/655]` for 4bit...
Started a `$ ./main --perplexity -m models/7B/ggml-model-q4_0.bin -f wiki.test.raw --memory_f16` run now. It already shows a small additional negative hit to perplexity vs baseline 4 bit, but will let it...