llama2.c
llama2.c copied to clipboard
Inference Llama 2 in one file of pure C
Clone of llama2.c but updated to work with Llama 3.2 1B/3B base and instruct
The weights are natively bfloat16. Rather than convert them into float, you could just keep them as bfloat16 and convert between float and bfloat16 on the fly using a union...
Why is the termination condition of the `generate` function `next = 1` (BOS) instead of `next = 2` (EOS)?
Hi, I believe that the bias is not removed in the quantize() function. This would be necessary to have a symmetric Q8_0 quantization of activations. Is that not needed? ```...
Is the export.py only created for model in run.c ? I use it to export hf model to a model.bin, but it doesn't work when I use it in train.py,...