codellama
codellama copied to clipboard
Is download.sh providing the correct tokenizer.model files?
When I try to run a model ..
torchrun example_js.py \
--ckpt_dir CodeLlama-13b-Instruct \
--tokenizer_path CodeLlama-13b-Instruct/tokenizer.model \
--max_seq_len 1024 --max_batch_size 4 --nproc_per_node 2
example_js is the same as the provide example_completion, but with different prompts
... I get this error:
RuntimeError: Error(s) in loading state_dict for Transformer:
size mismatch for tok_embeddings.weight: copying a param with shape torch.Size([32016, 2560]) from checkpoint, the shape in current model is torch.Size([32000, 5120]).
size mismatch for layers.0.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.0.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.0.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.0.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
...
This code works perfectly fine if I use the 7b model and tokenizer
Investigating a bit further, I noticed this:
md5sum CodeLlama-7b-Instruct/tokenizer.model
9e597e72392fd4005529a33f2bf708ba CodeLlama-7b-Instruct/tokenizer.model
md5sum CodeLlama-13b-Instruct/tokenizer.model
9e597e72392fd4005529a33f2bf708ba CodeLlama-13b-Instruct/tokenizer.model
md5sum CodeLlama-34b-Instruct/tokenizer.model
eeec4125e9c7560836b4873b6f8e3025 CodeLlama-34b-Instruct/tokenizer.model
the tokenizer for 7b and 13b are identical? That seems unlikely.
I also attempted these variants of torchrun just to see what happens
torchrun --ckpt_dir CodeLlama-13b-Instruct --tokenizer_path CodeLlama-34b-Instruct/tokenizer.model
torchrun --ckpt_dir CodeLlama-34b-Instruct --tokenizer_path CodeLlama-34b-Instruct/tokenizer.model --nproc_per_node 4
- These produced the same errors, but with different numbers
On another node, the --nproc_per_node value is provided to the commands just in case (as the docs say it's needed), but in practice I find it has no effect. I was forced to modify the code that builds the model like so:
generator = Llama.build(
ckpt_dir=ckpt_dir,
tokenizer_path=tokenizer_path,
max_seq_len=max_seq_len,
max_batch_size=max_batch_size,
# Added this, value is 2 for 13b and 4 for 34b
model_parallel_size=2,
)
I'm on an M1 Macbook Pro with 64 GB of ram
I encountered the same problem, CodeLlama-7b-Instruct works, but CodeLlama-13b-Instruct and CodeLlama-34b-Instruct failed. I manually set model_parallel_size=3 for 13b and 4 for 34b, still get size mismatch error.
Sorry for replying so late, but just to clarify, the 34b model uses a different tokenizer as it was not trained with fill-in-the-middle capabilities.
For the commands you provided, the --nproc_per_node
needs to be passed to torchrun
but by appending it to the rest of the command it will be pass to example_js.py
instead. The current version of the code will warn you about any model parallel mismatches at runtime. This command works for me:
torchrun --nproc_per_node=2 example_instructions.py \
--ckpt_dir CodeLlama-13b-Instruct \
--tokenizer_path CodeLlama-13b-Instruct/tokenizer.model