FastChat
FastChat copied to clipboard
Error : model.embed_tokens.weight
Download the latest source code, runing the command : "python -m fastchat.serve.cli --model-path "lmsys/vicuna-13b-delta-v1.1" --load-8bit"
error occurs:
init_kwargs {'torch_dtype': torch.float16}
0it [00:00, ?it/s]
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/cetc52/anaconda3/envs/pytorch/lib/python3.9/runpy.py:197 in _run_module_as_main │
│ │
│ 194 │ main_globals = sys.modules["main"].dict │
│ 195 │ if alter_argv: │
│ 196 │ │ sys.argv[0] = mod_spec.origin │
│ ❱ 197 │ return _run_code(code, main_globals, None, │
│ 198 │ │ │ │ │ "main", mod_spec) │
│ 199 │
│ 200 def run_module(mod_name, init_globals=None, │
│ │
│ /home/cetc52/anaconda3/envs/pytorch/lib/python3.9/runpy.py:87 in _run_code │
│ │
│ 84 │ │ │ │ │ loader = loader, │
│ 85 │ │ │ │ │ package = pkg_name, │
│ 86 │ │ │ │ │ spec = mod_spec) │
│ ❱ 87 │ exec(code, run_globals) │
│ 88 │ return run_globals │
│ 89 │
│ 90 def _run_module_code(code, init_globals=None, │
│ │
│ /home/cetc52/Downloads/FastChat/fastchat/serve/cli.py:151 in
I just repeat steps in the README.md
what should I do?
Same problem with macbook and --device mps
That's odd , that error occurs almost at the beginning of the demo operation, Only two of us find that?
The delta weights cannot be directly used. Please apply delta on top of a base llama model before using it. See help
@merrymercy Getting the same error. We'd use the folder of the convert command --target-model-path, correct? The help is not specific about that.
@JerryYao80 Did you find a solution?
... from conversion:
Saving the target model to ./vicuna-13b
user@x ~/P/llm> python3 -m fastchat.serve.cli --model-path ./vicuna-13b```
I still haven't successfully had this running yet.. but I notice that using a non-delta model only shows this error when including --load-8bit
(running on mac)
using: https://huggingface.co/eachadea/vicuna-7b-1.1
# fails immediately...
python -m fastchat.serve.cli --model-path eachadea/vicuna-7b-1.1 --device mps
...
KeyError: 'model.embed_tokens.weight'
# downloads model, then another error
python -m fastchat.serve.cli --model-path eachadea/vicuna-7b-1.1 --device mps --load-8bit
...
RuntimeError: PyTorch is not linked with support for mps devices1
Facing the same issue when running on cuda.
need to reopen this issue @merrymercy
edit: issue comes when using 8 bit quantization, due to the custom implementation done there. it also fails to load gptx tokenizers
I still haven't successfully had this running yet.. but I notice that using a non-delta model only shows this error when including
--load-8bit(running on mac)
using: https://huggingface.co/eachadea/vicuna-7b-1.1
# fails immediately... python -m fastchat.serve.cli --model-path eachadea/vicuna-7b-1.1 --device mps ... KeyError: 'model.embed_tokens.weight'# downloads model, then another error python -m fastchat.serve.cli --model-path eachadea/vicuna-7b-1.1 --device mps --load-8bit ... RuntimeError: PyTorch is not linked with support for mps devices1
have you solved this problem ?
have you solved this problem ?
Facing the same issue when running on cuda.
need to reopen this issue @merrymercy
edit: issue comes when using 8 bit quantization, due to the custom implementation done there. it also fails to load gptx tokenizers
have you solved this problem ?
You need to first convert the llama model weights to HF format.
You need to first convert the llama model weights to HF format.
I have fine-tuning based on this model: https://huggingface.co/eachadea/vicuna-13b-1.1
after fine-tuning, this command is success: python3 -m fastchat.serve.cli --model-path ../vicuna_output/checkpoint-1500/
when add --load-8bit, there is a ERROR:
KeyError: 'model.embed_tokens.weight'
Same here?
BAAI/AquilaChat2-34B
python3.10/site-packages/fastchat/model/model_adapter.py:247: UserWarning: 8-bit quantization is not supported for multi-gpu inference.