FastChat icon indicating copy to clipboard operation
FastChat copied to clipboard

Error : model.embed_tokens.weight

Open JerryYao80 opened this issue 2 years ago • 2 comments

Download the latest source code, runing the command : "python -m fastchat.serve.cli --model-path "lmsys/vicuna-13b-delta-v1.1" --load-8bit"

error occurs: init_kwargs {'torch_dtype': torch.float16} 0it [00:00, ?it/s] ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /home/cetc52/anaconda3/envs/pytorch/lib/python3.9/runpy.py:197 in _run_module_as_main │ │ │ │ 194 │ main_globals = sys.modules["main"].dict │ │ 195 │ if alter_argv: │ │ 196 │ │ sys.argv[0] = mod_spec.origin │ │ ❱ 197 │ return _run_code(code, main_globals, None, │ │ 198 │ │ │ │ │ "main", mod_spec) │ │ 199 │ │ 200 def run_module(mod_name, init_globals=None, │ │ │ │ /home/cetc52/anaconda3/envs/pytorch/lib/python3.9/runpy.py:87 in _run_code │ │ │ │ 84 │ │ │ │ │ loader = loader, │ │ 85 │ │ │ │ │ package = pkg_name, │ │ 86 │ │ │ │ │ spec = mod_spec) │ │ ❱ 87 │ exec(code, run_globals) │ │ 88 │ return run_globals │ │ 89 │ │ 90 def _run_module_code(code, init_globals=None, │ │ │ │ /home/cetc52/Downloads/FastChat/fastchat/serve/cli.py:151 in │ │ │ │ 148 │ ) │ │ 149 │ parser.add_argument("--debug", action="store_true", help="Print debug information") │ │ 150 │ args = parser.parse_args() │ │ ❱ 151 │ main(args) │ │ 152 │ │ │ │ /home/cetc52/Downloads/FastChat/fastchat/serve/cli.py:117 in main │ │ │ │ 114 │ else: │ │ 115 │ │ raise ValueError(f"Invalid style for console: {args.style}") │ │ 116 │ try: │ │ ❱ 117 │ │ chat_loop( │ │ 118 │ │ │ args.model_path, │ │ 119 │ │ │ args.device, │ │ 120 │ │ │ args.num_gpus, │ │ │ │ /home/cetc52/Downloads/FastChat/fastchat/serve/inference.py:293 in chat_loop │ │ │ │ 290 │ debug: bool, │ │ 291 ): │ │ 292 │ # Model │ │ ❱ 293 │ model, tokenizer = load_model( │ │ 294 │ │ model_path, device, num_gpus, max_gpu_memory, load_8bit, cpu_offloading, debug │ │ 295 │ ) │ │ 296 │ is_chatglm = "chatglm" in str(type(model)).lower() │ │ │ │ /home/cetc52/Downloads/FastChat/fastchat/serve/inference.py:121 in load_model │ │ │ │ 118 │ │ if num_gpus != 1: │ │ 119 │ │ │ warnings.warn("8-bit quantization is not supported for multi-gpu inference." │ │ 120 │ │ else: │ │ ❱ 121 │ │ │ return load_compress_model(model_path=model_path, │ │ 122 │ │ │ │ device=device, torch_dtype=kwargs["torch_dtype"]) │ │ 123 │ │ │ 124 │ if "chatglm" in model_path: │ │ │ │ /home/cetc52/Downloads/FastChat/fastchat/serve/compression.py:117 in load_compress_model │ │ │ │ 114 │ │ │ 115 │ for name in model.state_dict(): │ │ 116 │ │ if name not in linear_weights: │ │ ❱ 117 │ │ │ set_module_tensor_to_device(model, name, device, value=compressed_state_dict │ │ 118 │ apply_compressed_weight(model, compressed_state_dict, device) │ │ 119 │ │ │ 120 │ model.to(device) │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ KeyError: 'model.embed_tokens.weight'

I just repeat steps in the README.md

what should I do?

JerryYao80 avatar May 05 '23 10:05 JerryYao80

Same problem with macbook and --device mps

ugm2 avatar May 05 '23 18:05 ugm2

That's odd , that error occurs almost at the beginning of the demo operation, Only two of us find that?

JerryYao80 avatar May 06 '23 00:05 JerryYao80

The delta weights cannot be directly used. Please apply delta on top of a base llama model before using it. See help

merrymercy avatar May 08 '23 08:05 merrymercy

@merrymercy Getting the same error. We'd use the folder of the convert command --target-model-path, correct? The help is not specific about that.

@JerryYao80 Did you find a solution?

... from conversion:
Saving the target model to ./vicuna-13b

user@x ~/P/llm> python3 -m fastchat.serve.cli --model-path ./vicuna-13b```

Phlogi avatar May 10 '23 10:05 Phlogi

I still haven't successfully had this running yet.. but I notice that using a non-delta model only shows this error when including --load-8bit

(running on mac)

using: https://huggingface.co/eachadea/vicuna-7b-1.1

# fails immediately...
python -m fastchat.serve.cli --model-path eachadea/vicuna-7b-1.1 --device mps
...
KeyError: 'model.embed_tokens.weight'
# downloads model, then another error
python -m fastchat.serve.cli --model-path eachadea/vicuna-7b-1.1 --device mps --load-8bit
...
RuntimeError: PyTorch is not linked with support for mps devices1

david-wolgemuth avatar May 12 '23 13:05 david-wolgemuth

Facing the same issue when running on cuda.

need to reopen this issue @merrymercy

edit: issue comes when using 8 bit quantization, due to the custom implementation done there. it also fails to load gptx tokenizers

PCIHD avatar May 14 '23 06:05 PCIHD

I still haven't successfully had this running yet.. but I notice that using a non-delta model only shows this error when including --load-8bit

(running on mac)

using: https://huggingface.co/eachadea/vicuna-7b-1.1

# fails immediately...
python -m fastchat.serve.cli --model-path eachadea/vicuna-7b-1.1 --device mps
...
KeyError: 'model.embed_tokens.weight'
# downloads model, then another error
python -m fastchat.serve.cli --model-path eachadea/vicuna-7b-1.1 --device mps --load-8bit
...
RuntimeError: PyTorch is not linked with support for mps devices1

have you solved this problem ?

murray-z avatar May 24 '23 02:05 murray-z

have you solved this problem ?

murray-z avatar May 24 '23 02:05 murray-z

Facing the same issue when running on cuda.

need to reopen this issue @merrymercy

edit: issue comes when using 8 bit quantization, due to the custom implementation done there. it also fails to load gptx tokenizers

have you solved this problem ?

murray-z avatar May 24 '23 02:05 murray-z

You need to first convert the llama model weights to HF format.

tengyu-liu avatar May 24 '23 02:05 tengyu-liu

You need to first convert the llama model weights to HF format.

I have fine-tuning based on this model: https://huggingface.co/eachadea/vicuna-13b-1.1

after fine-tuning, this command is success: python3 -m fastchat.serve.cli --model-path ../vicuna_output/checkpoint-1500/

when add --load-8bit, there is a ERROR:

KeyError: 'model.embed_tokens.weight'

murray-z avatar May 24 '23 03:05 murray-z

Same here?

lucasjinreal avatar Nov 28 '23 11:11 lucasjinreal

BAAI/AquilaChat2-34B

python3.10/site-packages/fastchat/model/model_adapter.py:247: UserWarning: 8-bit quantization is not supported for multi-gpu inference.

oushu1zhangxiangxuan1 avatar Nov 29 '23 05:11 oushu1zhangxiangxuan1