llama2.c
llama2.c copied to clipboard
Inference Llama 2 in one file of pure C
This fixes two bugs that cause unexpected behavior when the hidden dim isn't evenly divisible by the quantization group size like in Stories42M which has hidden dim 1376 and group...
runq.c requires hidden_dim to be evenly divisible by the quantization group size. This change enforces that condition during model export. This can also be fixed [by changing runq.c](https://github.com/karpathy/llama2.c/pull/532).
I tried to run with this: python3 -m train.py --compile=False --eval_iters=10 --batch_size=8 But got this error, I think it is around my mac and cuda and torch compiled mode? File...
These changes add support for training with tinyshakesphere (change from llama2.py), and simple blank line separated text.
Thank you for this nice repo! We at Intel have created a SYCL version of it and would like to contribute it here. The SYCL code inside `/sycl` was tested...
I have tried to convert llama 2 model from .gguf to .bin ``` ~/llm_inferences/llama.cpp/models/meta$ ls llama-2-7b.Q4_K_M.gguf python3 export.py llama2_7b.bin --meta-llama /home/####/llm_inferences/llama.cpp/models Traceback (most recent call last): File "/home/aadithya.bhat/llm_inferences/llama2.c/export.py", line 559,...
Is it possible to implement weight share of input and output embedding? It will save a lot of params for a small model!
Also, add `.clang-format` to set a format specification with minimal changes to the code.
First pull request ever! Please be kind :) I propose an implementation of the LoRA finetuning algorithm. I'm a basic user of pytorch and a total newbie about more advanced...