llama icon indicating copy to clipboard operation
llama copied to clipboard

AssertionError: Loading a checkpoint for MP=8 but world size is 2

Open bhargavanubavam opened this issue 1 year ago • 1 comments

Hi guys, I got an error while trying to deploy llama-2-70b-chat

Command: torchrun --nproc_per_node 8 example_chat_completion.py --ckpt_dir llama-2-70b-chat/ --tokenizer_path tokenizer.model --max_seq_len 512 --max_batch_size 6

Error:

initializing model parallel with size 8 initializing ddp with size 1 initializing pipeline with size 1 Traceback (most recent call last): File "/olddata/llama/example_chat_completion.py", line 104, in fire.Fire(main) File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File "/olddata/llama/example_chat_completion.py", line 35, in main generator = Llama.build( File "/olddata/llama/llama/generation.py", line 103, in build assert model_parallel_size == len( AssertionError: Loading a checkpoint for MP=2 but world size is 8

I have cloned the llama2 github repo, downloaded the model - download.sh and using example_chat_completion.py file, I am running on AWS EC2 instance with 8 GPUs.

bhargavanubavam avatar Jan 30 '24 07:01 bhargavanubavam

The default MP sharding for llama-2-70b-chat is 8, so you shouldn't be facing this error. llama-2-13b has an MP=2... is it possible that you are accidentally using that model instead?

As a reference, you can also use this resharding script if you want to run models with different WORLD_SIZE than MP

subramen avatar Jan 31 '24 17:01 subramen

Closing this issue, @bhargavanubavam feel free to reopen when you have more information

subramen avatar Mar 27 '24 16:03 subramen