Andrej
Andrej
appreciate it, will leave as is due to video
Yes it uses a custom much smaller vocab. Here are some docs that might help: https://github.com/karpathy/llama2.c/blob/master/doc/stories260K.md
!!! On quick skim - amazing, I love it. I'll take a close look and think through how this should interact with the CPU version.
Same error, tried to re-download a few times but can't seem to get Llama 2 70B working on my Mac. But Llama 2 7B worked earlier.
huh. i'm only doing a quick skim atm. did i mess up the sizing of this oops
I see, thanks for raising. Thinking...
Probably the legacy script export works, I'm guessing? https://github.com/karpathy/llama2.c/blob/de005474d37d0cde1356739b8c79ebe7b42b5973/export_meta_llama_bin.py As a temporary patch... sigh
Nice, this will be a helpful reference. This is Q8_1 scheme. A few things that are in my mind for quantization: - I think I will change the python script...
@byte-6174 not to my knowledge? it's possible to do quantization-aware finetuning to improve a model for quantization, but you can quantize it anyway.
@kroggen Normally you wouldn't even quantize the rmsnorm params. There are very few of them. You only quantize matmuls and those are symmetric. @byte-6174 thanks for the link to the...