Add llama3.2.c port to README.md
Clone of llama2.c but updated to work with Llama 3.2 1B/3B base and instruct
Hey Dylan I have a question, any assistance will be highly appreciated. I want to convert DeepSeek-R1-Llama-8B into .bin format can I use the same export.py for this?
@Uzair-90 Maybe? I only ever tested with meta-llama/Llama-3.2-1B. For export.py to work the model needs to be loadable with transformers and share all the same parameter names as llama. I don't know what the specifics are for their distillation process so not sure if it will work.
You can try:
python3 export.py DeepSeek-R1-Distill-Llama-8B.bin --hf deepseek-ai/DeepSeek-R1-Distill-Llama-8B
If all you want to do is run a model locally check out lmstudio, or ollama, these are more general established projects that let you run basically any of the models whereas this one is hard coded for llama.
I already tried this and it works like you can make a .bin file from DeepSeek-Distill-Llama-8B but the provided tokenizer.bin file is not compatible I guess I need to figure out what formatting do I need for my tokenizer.bin
@Uzair-90 Yeah so looking at the 2 tokenizers here and here they seem to have some small differences but I think you can get around them if you are determined.
It looks like its mainly just the special tokens have different IDs (see the added_tokens key in both files), but the mergeable ranks/bpe part seems to be the same (vocab key).
I believe you need to edit tokenizer.py to make the tokenizer.bin have the special token IDs/ranks for deepseek. If you look at the export method in there all its doing is doing is outputting the tokens with there scores to the tokenizer.bin file.
Thank you @Dylan-Harden3 really appreciate it I will look into it.
@Dylan-Harden3 can you help me with the problem:
size mismatch for model.layers.31.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.31.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
You may consider adding `ignore_mismatched_sizes=True` in the model `from_pretrained` method
The same problem arises for all 32 layers what specific changes do I need?
I ma using transformers 4.30.0
@Uzair-90 Is this when you run export.py? Please kindly open a Q&A discussion in my fork for further questions, I would like Andrej to approve this PR one day and don't want a long unrelated thread to get in the way.