Andrej comments

Results 373 comments of


                                            Andrej

vectorized dC

appreciate it, will leave as is due to video

260K Model Parameter count not right?

Yes it uses a custom much smaller vocab. Here are some docs that might help: https://github.com/karpathy/llama2.c/blob/master/doc/stories260K.md

llama2.cu - a simple cuda implementation

!!! On quick skim - amazing, I love it. I'll take a close look and think through how this should interact with the CPU version.

`digest mismatch` on download

Same error, tried to re-download a few times but can't seem to get Llama 2 70B working on my Mac. But Llama 2 7B worked earlier.

fix allocation of scaling factors

huh. i'm only doing a quick skim atm. did i mess up the sizing of this oops

New export code OOM with 7B model

I see, thanks for raising. Thinking...

New export code OOM with 7B model

Probably the legacy script export works, I'm guessing? https://github.com/karpathy/llama2.c/blob/de005474d37d0cde1356739b8c79ebe7b42b5973/export_meta_llama_bin.py As a temporary patch... sigh

8-bit Quantization

Nice, this will be a helpful reference. This is Q8_1 scheme. A few things that are in my mind for quantization: - I think I will change the python script...

8-bit Quantization

@byte-6174 not to my knowledge? it's possible to do quantization-aware finetuning to improve a model for quantization, but you can quantize it anyway.

8-bit Quantization

@kroggen Normally you wouldn't even quantize the rmsnorm params. There are very few of them. You only quantize matmuls and those are symmetric. @byte-6174 thanks for the link to the...