llama
llama copied to clipboard
AssertionError: Loading a checkpoint for MP=8 but world size is 2
Hi guys, I got an error while trying to deploy llama-2-70b-chat
Command: torchrun --nproc_per_node 8 example_chat_completion.py --ckpt_dir llama-2-70b-chat/ --tokenizer_path tokenizer.model --max_seq_len 512 --max_batch_size 6
Error:
initializing model parallel with size 8 initializing ddp with size 1 initializing pipeline with size 1 Traceback (most recent call last): File "/olddata/llama/example_chat_completion.py", line 104, in fire.Fire(main) File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File "/olddata/llama/example_chat_completion.py", line 35, in main generator = Llama.build( File "/olddata/llama/llama/generation.py", line 103, in build assert model_parallel_size == len( AssertionError: Loading a checkpoint for MP=2 but world size is 8
I have cloned the llama2 github repo, downloaded the model - download.sh and using example_chat_completion.py file, I am running on AWS EC2 instance with 8 GPUs.
The default MP sharding for llama-2-70b-chat is 8, so you shouldn't be facing this error. llama-2-13b has an MP=2... is it possible that you are accidentally using that model instead?
As a reference, you can also use this resharding script if you want to run models with different WORLD_SIZE than MP
Closing this issue, @bhargavanubavam feel free to reopen when you have more information