Gary Mulder comments

Results 154 comments of


                                            Gary Mulder

Improving quality with 8bit?

122GB. What would be interesting is to benchmark quality versus memory size, i.e. does say a fp16 13B model generate better output than a int4 60GB model? @apollotsantos are you...

Improving quality with 8bit?

This issue is perhaps misnamed, now, as 8bit will likely improve _quality_ over 4bit but not _performance_. In summary: - Inference performance: 4bit > 8bit > fp16 (as the code...

any interest in the openchatkit on a power book?

> So far LLAMA version is quite bad at code generation , otherwise quite good . You might want to read the original paper [LLaMA: Open and Efficient Foundation Language...

Different outputs for differents numbers of threads (same seed)

That isn't surprising, as each thread may be getting its own [random seed](https://en.wikipedia.org/wiki/Random_seed). Changing the number of threads would then change the random seed initialisation, thus generating different output.

65B model giving incorect output

fp16 and 4-bit quantized working for me for 30B and 65B models. I haven't run the smaller models: ``` $ uname -a Linux asushimu 5.15.0-60-generic #66-Ubuntu SMP Fri Jan 20...

65B model giving incorect output

I just pulled the latest code and will regression check the output with all 4-bit models: ``` $ ls -s ./*/ggml* | sort -k 2,2 15886376 ./30B/ggml-model-f16.bin 15886368 ./30B/ggml-model-f16.bin.1 15886392...

65B model giving incorect output

Note that as per @ggerganov's correction to my observation in issue #95, the number of threads and other subtleties such as different floating point implementations may prevent us from reproducing...

65B model giving incorect output

The conversion and quantization should be deterministic, so if the bin files don't match the pth files won't match: ```` $ md5sum */*pth 0804c42ca65584f50234a86d71e6916a 13B/consolidated.00.pth 016017be6040da87604f77703b92f2bc 13B/consolidated.01.pth f856e9d99c30855d6ead4d00cc3a5573 30B/consolidated.00.pth d9dbfbea61309dc1e087f5081e98331a...

65B model giving incorect output

0.3 to 0.5 looks to be better, especially for the smaller models. The "10 simple steps" looks to be a useful prompt to test the each model's ability to count...

65B model giving incorect output

I also explored `--top_k` but suspect `--top_k` is currently broken. See issue #56