llama2.c Add llama3.2.c port to README.md

Clone of llama2.c but updated to work with Llama 3.2 1B/3B base and instruct

Dec 06 '24 15:12 Dylan-Harden3

Hey Dylan I have a question, any assistance will be highly appreciated. I want to convert DeepSeek-R1-Llama-8B into .bin format can I use the same export.py for this?

Jan 29 '25 12:01 Uzair-90

@Uzair-90 Maybe? I only ever tested with meta-llama/Llama-3.2-1B. For export.py to work the model needs to be loadable with transformers and share all the same parameter names as llama. I don't know what the specifics are for their distillation process so not sure if it will work. You can try:

python3 export.py DeepSeek-R1-Distill-Llama-8B.bin --hf deepseek-ai/DeepSeek-R1-Distill-Llama-8B

If all you want to do is run a model locally check out lmstudio, or ollama, these are more general established projects that let you run basically any of the models whereas this one is hard coded for llama.

Jan 29 '25 16:01 Dylan-Harden3

I already tried this and it works like you can make a .bin file from DeepSeek-Distill-Llama-8B but the provided tokenizer.bin file is not compatible I guess I need to figure out what formatting do I need for my tokenizer.bin

Jan 29 '25 19:01 Uzair-90

@Uzair-90 Yeah so looking at the 2 tokenizers here and here they seem to have some small differences but I think you can get around them if you are determined.

It looks like its mainly just the special tokens have different IDs (see the added_tokens key in both files), but the mergeable ranks/bpe part seems to be the same (vocab key).

I believe you need to edit tokenizer.py to make the tokenizer.bin have the special token IDs/ranks for deepseek. If you look at the export method in there all its doing is doing is outputting the tokens with there scores to the tokenizer.bin file.

Jan 29 '25 20:01 Dylan-Harden3

Thank you @Dylan-Harden3 really appreciate it I will look into it.

Jan 30 '25 05:01 Uzair-90

@Dylan-Harden3 can you help me with the problem:

size mismatch for model.layers.31.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.31.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
You may consider adding `ignore_mismatched_sizes=True` in the model `from_pretrained` method

The same problem arises for all 32 layers what specific changes do I need?
I ma using transformers 4.30.0

Jan 31 '25 07:01 Uzair-90

@Uzair-90 Is this when you run export.py? Please kindly open a Q&A discussion in my fork for further questions, I would like Andrej to approve this PR one day and don't want a long unrelated thread to get in the way.

Jan 31 '25 17:01 Dylan-Harden3