LLaMA-Factory mac m2 芯片Lora 后的模型推理结果都是乱码

Reminder

[X] I have read the README and searched the existing issues.

Reproduction

微调参数及输出

llamafactory-cli train \
    --stage sft \
    --do_train \
    --model_name_or_path /Users/henryyan/llm/models/Meta-Llama-3-8B-Instruct \
    --dataset identity \
    --dataset_dir ./data \
    --template llama3 \
    --finetuning_type lora \
    --lora_target q_proj,v_proj \
    --output_dir ./saves/LLaMA3-8B-KFT2/lora/sft \
    --overwrite_cache \
    --overwrite_output_dir \
    --cutoff_len 1024 \
    --preprocessing_num_workers 16 \
    --per_device_train_batch_size 2 \
    --per_device_eval_batch_size 1 \
    --gradient_accumulation_steps 8 \
    --lr_scheduler_type cosine \
    --logging_steps 50 \
    --warmup_steps 20 \
    --save_steps 100 \
    --eval_steps 50 \
    --evaluation_strategy steps \
    --load_best_model_at_end \
    --learning_rate 5e-5 \
    --num_train_epochs 5.0 \
    --max_samples 1000 \
    --val_size 0.1 \
    --plot_loss
05/08/2024 23:53:47 - WARNING - llmtuner.hparams.parser - We recommend enable mixed precision training.
>>>> get_current_device is mps:0 <<<<
05/08/2024 23:53:47 - INFO - llmtuner.hparams.parser - Process rank: 0, device: mps, n_gpu: 1, distributed training: False, compute dtype: None
[INFO|tokenization_utils_base.py:2085] 2024-05-08 23:53:47,844 >> loading file tokenizer.json
[INFO|tokenization_utils_base.py:2085] 2024-05-08 23:53:47,844 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2085] 2024-05-08 23:53:47,844 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2085] 2024-05-08 23:53:47,844 >> loading file tokenizer_config.json
[WARNING|logging.py:314] 2024-05-08 23:53:47,983 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
05/08/2024 23:53:47 - INFO - llmtuner.data.template - Replace eos token: <|eot_id|>
05/08/2024 23:53:47 - INFO - llmtuner.data.template - Add pad token: <|eot_id|>
05/08/2024 23:53:47 - INFO - llmtuner.data.loader - Loading dataset identity.json...
05/08/2024 23:53:47 - WARNING - llmtuner.data.utils - Checksum failed: mismatched SHA-1 hash value at ./data/identity.json.
Converting format of dataset (num_proc=16): 100%|█████████████████████████████████████████████████████████████████████| 91/91 [00:00<00:00, 773.83 examples/s]
Running tokenizer on dataset (num_proc=16): 100%|██████████████████████████████████████████████████████████████████████| 91/91 [00:01<00:00, 64.16 examples/s]
input_ids:
[128000, 128006, 9125, 128007, 271, 2675, 527, 264, 11190, 18328, 13, 128009, 128006, 882, 128007, 271, 6151, 128009, 128006, 78191, 128007, 271, 9906, 0, 358, 1097, 35469, 242, 115007, 15836, 11, 459, 15592, 18328, 8040, 555, 103990, 244, 28308, 94, 6708, 242, 13, 2650, 649, 358, 7945, 499, 3432, 30, 128009]
inputs:
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are a helpful assistant.<|eot_id|><|start_header_id|>user<|end_header_id|>

hi<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Hello! I am 兔爷AI, an AI assistant developed by 咖啡兔. How can I assist you today?<|eot_id|>
label_ids:
[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 9906, 0, 358, 1097, 35469, 242, 115007, 15836, 11, 459, 15592, 18328, 8040, 555, 103990, 244, 28308, 94, 6708, 242, 13, 2650, 649, 358, 7945, 499, 3432, 30, 128009]
labels:
Hello! I am 兔爷AI, an AI assistant developed by 咖啡兔. How can I assist you today?<|eot_id|>
[INFO|configuration_utils.py:724] 2024-05-08 23:53:50,931 >> loading configuration file /Users/henryyan/llm/models/Meta-Llama-3-8B-Instruct/config.json
[INFO|configuration_utils.py:789] 2024-05-08 23:53:50,931 >> Model config LlamaConfig {
  "_name_or_path": "/Users/henryyan/llm/models/Meta-Llama-3-8B-Instruct",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 128000,
  "eos_token_id": 128001,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "max_position_embeddings": 8192,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 8,
  "pretraining_tp": 1,
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "rope_theta": 500000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.40.1",
  "use_cache": true,
  "vocab_size": 128256
}

[INFO|modeling_utils.py:3426] 2024-05-08 23:53:50,957 >> loading weights file /Users/henryyan/llm/models/Meta-Llama-3-8B-Instruct/model.safetensors.index.json
[INFO|modeling_utils.py:1494] 2024-05-08 23:53:50,958 >> Instantiating LlamaForCausalLM model under default dtype torch.float32.
[INFO|configuration_utils.py:928] 2024-05-08 23:53:50,958 >> Generate config GenerationConfig {
  "bos_token_id": 128000,
  "eos_token_id": 128001
}

Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:11<00:00,  2.81s/it]
[INFO|modeling_utils.py:4170] 2024-05-08 23:54:02,425 >> All model checkpoint weights were used when initializing LlamaForCausalLM.

[INFO|modeling_utils.py:4178] 2024-05-08 23:54:02,425 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at /Users/henryyan/llm/models/Meta-Llama-3-8B-Instruct.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
[INFO|configuration_utils.py:881] 2024-05-08 23:54:02,428 >> loading configuration file /Users/henryyan/llm/models/Meta-Llama-3-8B-Instruct/generation_config.json
[INFO|configuration_utils.py:928] 2024-05-08 23:54:02,428 >> Generate config GenerationConfig {
  "bos_token_id": 128000,
  "do_sample": true,
  "eos_token_id": [
    128001,
    128009
  ],
  "max_length": 4096,
  "temperature": 0.6,
  "top_p": 0.9
}

05/08/2024 23:54:02 - INFO - llmtuner.model.utils.checkpointing - Gradient checkpointing enabled.
05/08/2024 23:54:02 - INFO - llmtuner.model.utils.attention - Using torch SDPA for faster training and inference.
05/08/2024 23:54:02 - INFO - llmtuner.model.adapter - Fine-tuning method: LoRA
05/08/2024 23:54:02 - INFO - llmtuner.model.loader - trainable params: 3407872 || all params: 8033669120 || trainable%: 0.0424
[INFO|trainer.py:441] 2024-05-08 23:54:02,561 >> You have loaded a model on multiple GPUs. `is_model_parallel` attribute will be force-set to `True` to avoid any unexpected behavior such as device placement mismatching.
[INFO|trainer.py:2048] 2024-05-08 23:54:02,646 >> ***** Running training *****
[INFO|trainer.py:2049] 2024-05-08 23:54:02,646 >>   Num examples = 81
[INFO|trainer.py:2050] 2024-05-08 23:54:02,646 >>   Num Epochs = 5
[INFO|trainer.py:2051] 2024-05-08 23:54:02,646 >>   Instantaneous batch size per device = 2
[INFO|trainer.py:2054] 2024-05-08 23:54:02,647 >>   Total train batch size (w. parallel, distributed & accumulation) = 16
[INFO|trainer.py:2055] 2024-05-08 23:54:02,647 >>   Gradient Accumulation steps = 8
[INFO|trainer.py:2056] 2024-05-08 23:54:02,647 >>   Total optimization steps = 25
[INFO|trainer.py:2057] 2024-05-08 23:54:02,647 >>   Number of trainable parameters = 3,407,872
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 25/25 [02:39<00:00,  6.41s/it][INFO|trainer.py:2316] 2024-05-08 23:56:41,929 >>

Training completed. Do not forget to share your model on huggingface.co/models =)


{'train_runtime': 159.2823, 'train_samples_per_second': 2.543, 'train_steps_per_second': 0.157, 'train_loss': 9015320.32, 'epoch': 4.88}
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 25/25 [02:39<00:00,  6.37s/it]
[INFO|trainer.py:3305] 2024-05-08 23:56:41,931 >> Saving model checkpoint to ./saves/LLaMA3-8B-KFT2/lora/sft
/opt/homebrew/anaconda3/envs/llama_factory/lib/python3.10/site-packages/peft/utils/save_and_load.py:154: UserWarning: Could not find a config file in /Users/henryyan/llm/models/Meta-Llama-3-8B-Instruct - will assume that the vocabulary was not modified.
  warnings.warn(
[INFO|tokenization_utils_base.py:2488] 2024-05-08 23:56:41,972 >> tokenizer config file saved in ./saves/LLaMA3-8B-KFT2/lora/sft/tokenizer_config.json
[INFO|tokenization_utils_base.py:2497] 2024-05-08 23:56:41,972 >> Special tokens file saved in ./saves/LLaMA3-8B-KFT2/lora/sft/special_tokens_map.json
***** train metrics *****
  epoch                    =      4.878
  total_flos               =  1219747GF
  train_loss               = 9015320.32
  train_runtime            = 0:02:39.28
  train_samples_per_second =      2.543
  train_steps_per_second   =      0.157
05/08/2024 23:56:42 - WARNING - llmtuner.extras.ploting - No metric loss to plot.
05/08/2024 23:56:42 - WARNING - llmtuner.extras.ploting - No metric eval_loss to plot.
[INFO|trainer.py:3614] 2024-05-08 23:56:42,050 >> ***** Running Evaluation *****
[INFO|trainer.py:3616] 2024-05-08 23:56:42,050 >>   Num examples = 10
[INFO|trainer.py:3619] 2024-05-08 23:56:42,050 >>   Batch size = 1
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:01<00:00,  6.05it/s]
***** eval metrics *****
  epoch                   =      4.878
  eval_loss               =        nan
  eval_runtime            = 0:00:01.81
  eval_samples_per_second =      5.508
  eval_steps_per_second   =      5.508
[INFO|modelcard.py:450] 2024-05-08 23:56:43,866 >> Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}

叠加Lora 运行结果是乱码

llamafactory-cli chat \
    --model_name_or_path /Users/henryyan/llm/models/Meta-Llama-3-8B-Instruct \
    --adapter_name_or_path saves/LLaMA3-8B-KFT2/lora/sft  \
    --template llama3 \
    --finetuning_type lora

直接运行原版LLama3正常

llamafactory-cli chat \
    --model_name_or_path /Users/henryyan/llm/models/Meta-Llama-3-8B-Instruct \
    --template llama3

尝试导出再运行

llamafactory-cli export \
    --model_name_or_path /Users/henryyan/llm/models/Meta-Llama-3-8B-Instruct \
    --adapter_name_or_path saves/LLaMA3-8B-KFT/lora/sft  \
    --template llama3 \
    --finetuning_type lora \
    --export_dir /Users/henryyan/llm/models/LLaMA3-8B-KFT \
    --export_size 2 \
    --export_device cpu \
    --export_legacy_format False

llamafactory-cli export \
    --model_name_or_path /Users/henryyan/llm/models/Meta-Llama-3-8B-Instruct \
    --adapter_name_or_path saves/LLaMA3-8B-KFT2/lora/sft  \
    --template llama3 \
    --finetuning_type lora \
    --export_dir /Users/henryyan/llm/models/LLaMA3-8B-KFT2 \
    --export_size 2 \
    --export_device cpu \
    --export_legacy_format False
[INFO|tokenization_utils_base.py:2085] 2024-05-09 00:09:24,591 >> loading file tokenizer.json
[INFO|tokenization_utils_base.py:2085] 2024-05-09 00:09:24,591 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2085] 2024-05-09 00:09:24,591 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2085] 2024-05-09 00:09:24,591 >> loading file tokenizer_config.json
[WARNING|logging.py:314] 2024-05-09 00:09:24,729 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
05/09/2024 00:09:24 - INFO - llmtuner.data.template - Replace eos token: <|eot_id|>
05/09/2024 00:09:24 - INFO - llmtuner.data.template - Add pad token: <|eot_id|>
[INFO|configuration_utils.py:724] 2024-05-09 00:09:24,729 >> loading configuration file /Users/henryyan/llm/models/Meta-Llama-3-8B-Instruct/config.json
[INFO|configuration_utils.py:789] 2024-05-09 00:09:24,730 >> Model config LlamaConfig {
  "_name_or_path": "/Users/henryyan/llm/models/Meta-Llama-3-8B-Instruct",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 128000,
  "eos_token_id": 128001,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "max_position_embeddings": 8192,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 8,
  "pretraining_tp": 1,
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "rope_theta": 500000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.40.1",
  "use_cache": true,
  "vocab_size": 128256
}

05/09/2024 00:09:24 - INFO - llmtuner.model.patcher - Using KV cache for faster generation.
[INFO|modeling_utils.py:3426] 2024-05-09 00:09:24,743 >> loading weights file /Users/henryyan/llm/models/Meta-Llama-3-8B-Instruct/model.safetensors.index.json
[INFO|modeling_utils.py:1494] 2024-05-09 00:09:24,743 >> Instantiating LlamaForCausalLM model under default dtype torch.float32.
[INFO|configuration_utils.py:928] 2024-05-09 00:09:24,743 >> Generate config GenerationConfig {
  "bos_token_id": 128000,
  "eos_token_id": 128001
}

Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:06<00:00,  1.64s/it]
[INFO|modeling_utils.py:4170] 2024-05-09 00:09:31,610 >> All model checkpoint weights were used when initializing LlamaForCausalLM.

[INFO|modeling_utils.py:4178] 2024-05-09 00:09:31,610 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at /Users/henryyan/llm/models/Meta-Llama-3-8B-Instruct.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
[INFO|configuration_utils.py:881] 2024-05-09 00:09:31,611 >> loading configuration file /Users/henryyan/llm/models/Meta-Llama-3-8B-Instruct/generation_config.json
[INFO|configuration_utils.py:928] 2024-05-09 00:09:31,612 >> Generate config GenerationConfig {
  "bos_token_id": 128000,
  "do_sample": true,
  "eos_token_id": [
    128001,
    128009
  ],
  "max_length": 4096,
  "temperature": 0.6,
  "top_p": 0.9
}

05/09/2024 00:09:31 - INFO - llmtuner.model.utils.attention - Using torch SDPA for faster training and inference.
05/09/2024 00:09:31 - INFO - llmtuner.model.adapter - Fine-tuning method: LoRA
05/09/2024 00:09:32 - INFO - llmtuner.model.adapter - Merged 1 adapter(s).
05/09/2024 00:09:32 - INFO - llmtuner.model.adapter - Loaded adapter(s): saves/LLaMA3-8B-KFT2/lora/sft
05/09/2024 00:09:32 - INFO - llmtuner.model.loader - all params: 8030261248
[INFO|configuration_utils.py:471] 2024-05-09 00:09:38,169 >> Configuration saved in /Users/henryyan/llm/models/LLaMA3-8B-KFT2/config.json
[INFO|configuration_utils.py:697] 2024-05-09 00:09:38,170 >> Configuration saved in /Users/henryyan/llm/models/LLaMA3-8B-KFT2/generation_config.json
[INFO|modeling_utils.py:2598] 2024-05-09 00:09:44,352 >> The model is bigger than the maximum size per checkpoint (2GB) and is going to be split in 9 checkpoint shards. You can find where each parameters has been saved in the index located at /Users/henryyan/llm/models/LLaMA3-8B-KFT2/model.safetensors.index.json.
[INFO|tokenization_utils_base.py:2488] 2024-05-09 00:09:44,353 >> tokenizer config file saved in /Users/henryyan/llm/models/LLaMA3-8B-KFT2/tokenizer_config.json
[INFO|tokenization_utils_base.py:2497] 2024-05-09 00:09:44,354 >> Special tokens file saved in /Users/henryyan/llm/models/LLaMA3-8B-KFT2/special_tokens_map.json

导出后再运行还是乱码😭

Expected behavior

No response

System Info

MacOS 14.4.1
M2 Ultra 128G
Python 3.10.14
pip 23.3.1

pip list

Package                       Version     Editable project location
----------------------------- ----------- ---------------------------------
accelerate                    0.29.3
aiofiles                      23.2.1
aiohttp                       3.9.5
aiosignal                     1.3.1
altair                        5.3.0
annotated-types               0.6.0
anyio                         4.3.0
appdirs                       1.4.4
async-timeout                 4.0.3
attrs                         23.2.0
certifi                       2024.2.2
charset-normalizer            3.3.2
click                         8.1.7
contourpy                     1.2.1
cycler                        0.12.1
datasets                      2.19.0
dill                          0.3.8
distro                        1.9.0
docstring_parser              0.16
dynaconf                      3.2.5
einops                        0.7.0
evidently                     0.4.19
exceptiongroup                1.2.1
Faker                         24.14.0
fastapi                       0.110.2
ffmpy                         0.3.2
filelock                      3.13.4
fire                          0.6.0
fonttools                     4.51.0
frozenlist                    1.4.1
fsspec                        2024.3.1
gradio                        4.28.3
gradio_client                 0.16.0
h11                           0.14.0
httpcore                      1.0.5
httpx                         0.27.0
huggingface-hub               0.23.0
idna                          3.7
importlib_resources           6.4.0
iterative-telemetry           0.0.8
Jinja2                        3.1.3
joblib                        1.4.0
jsonschema                    4.21.1
jsonschema-specifications     2023.12.1
kiwisolver                    1.4.5
litestar                      2.8.2
llmtuner                      0.7.1.dev0  /Users/henryyan/llm/LLaMA-Factory
markdown-it-py                3.0.0
MarkupSafe                    2.1.5
matplotlib                    3.8.4
mdurl                         0.1.2
mlx                           0.12.2
mpmath                        1.3.0
msgspec                       0.18.6
multidict                     6.0.5
multiprocess                  0.70.16
mypy-extensions               1.0.0
networkx                      3.3
nltk                          3.8.1
numpy                         1.26.4
orjson                        3.10.1
packaging                     24.0
pandas                        2.2.2
patsy                         0.5.6
peft                          0.10.0
pillow                        10.3.0
pip                           23.3.1
plotly                        5.21.0
polyfactory                   2.15.0
protobuf                      5.26.1
psutil                        5.9.8
pyarrow                       16.0.0
pyarrow-hotfix                0.6
pydantic                      2.7.1
pydantic_core                 2.18.2
pydub                         0.25.1
Pygments                      2.17.2
pyparsing                     3.1.2
python-dateutil               2.9.0.post0
python-multipart              0.0.9
pytz                          2024.1
PyYAML                        6.0.1
referencing                   0.35.0
regex                         2024.4.16
requests                      2.31.0
rich                          13.7.1
rich-click                    1.7.4
rpds-py                       0.18.0
ruff                          0.4.2
safetensors                   0.4.3
scikit-learn                  1.4.2
scipy                         1.13.0
semantic-version              2.10.0
sentencepiece                 0.2.0
setuptools                    68.2.2
shellingham                   1.5.4
shtab                         1.7.1
six                           1.16.0
sniffio                       1.3.1
sse-starlette                 2.1.0
starlette                     0.37.2
statsmodels                   0.14.2
sympy                         1.12
tenacity                      8.2.3
termcolor                     2.4.0
threadpoolctl                 3.4.0
tiktoken                      0.6.0
tokenizers                    0.19.1
tomlkit                       0.12.0
toolz                         0.12.1
torch                         2.2.2
torchaudio                    2.2.2
torchvision                   0.17.2
tqdm                          4.66.2
transformers                  4.40.1
transformers-stream-generator 0.0.5
trl                           0.8.6
typer                         0.12.3
typing_extensions             4.11.0
typing-inspect                0.9.0
tyro                          0.8.3
tzdata                        2024.1
ujson                         5.9.0
urllib3                       2.2.1
uvicorn                       0.29.0
watchdog                      4.0.0
websockets                    11.0.3
wheel                         0.41.2
xxhash                        3.4.1
yarl                          1.9.4

Others

已经积极尝试各种办法均无效，请求支援。

May 08 '24 16:05 henryyan

目前看是训练的时候就有问题，loss异常地大。

可以再试一下其他模型，比如qwen1.5-0.5B ，我这边可以正常训练。对比一下可能是什么环节的问题

May 09 '24 08:05 codemayq

目前看是训练的时候就有问题，loss异常地大。

可以再试一下其他模型，比如qwen1.5-0.5B ，我这边可以正常训练。对比一下可能是什么环节的问题

尝试了 Qwen-1.5-0.5B 是可以的，这就有点奇怪了，我尝试一下加大数据集在用 7B 训练。

May 09 '24 15:05 henryyan

mac 只能支持单精度的训练，没有办法做混合精度，不确定是不是两个版本的模型的数据格式有区别导致的。

May 11 '24 02:05 codemayq

mac 只能支持单精度的训练，没有办法做混合精度，不确定是不是两个版本的模型的数据格式有区别导致的。

有这个可能，我尝试一下单独租一个计算资源用同样的命令训练

May 12 '24 13:05 henryyan

提示这个：Can't infer missing attention mask on mps device. Please provide an attention_mask or use a different device. 是不是在mac无法做推理

Jun 02 '24 13:06 luxhit

LLaMA-Factory LLaMA-Factory copied to clipboard

mac m2 芯片Lora 后的模型推理结果都是乱码

Reminder

Reproduction

微调参数及输出

叠加Lora 运行结果是乱码

直接运行原版LLama3正常

尝试导出再运行

导出后再运行还是乱码😭

Expected behavior

System Info

pip list

Others

LLaMA-Factory
LLaMA-Factory copied to clipboard