LLaMA-Factory
LLaMA-Factory copied to clipboard
mac m2 芯片Lora 后的模型推理结果都是乱码
Reminder
- [X] I have read the README and searched the existing issues.
Reproduction
微调参数及输出
llamafactory-cli train \
--stage sft \
--do_train \
--model_name_or_path /Users/henryyan/llm/models/Meta-Llama-3-8B-Instruct \
--dataset identity \
--dataset_dir ./data \
--template llama3 \
--finetuning_type lora \
--lora_target q_proj,v_proj \
--output_dir ./saves/LLaMA3-8B-KFT2/lora/sft \
--overwrite_cache \
--overwrite_output_dir \
--cutoff_len 1024 \
--preprocessing_num_workers 16 \
--per_device_train_batch_size 2 \
--per_device_eval_batch_size 1 \
--gradient_accumulation_steps 8 \
--lr_scheduler_type cosine \
--logging_steps 50 \
--warmup_steps 20 \
--save_steps 100 \
--eval_steps 50 \
--evaluation_strategy steps \
--load_best_model_at_end \
--learning_rate 5e-5 \
--num_train_epochs 5.0 \
--max_samples 1000 \
--val_size 0.1 \
--plot_loss
05/08/2024 23:53:47 - WARNING - llmtuner.hparams.parser - We recommend enable mixed precision training.
>>>> get_current_device is mps:0 <<<<
05/08/2024 23:53:47 - INFO - llmtuner.hparams.parser - Process rank: 0, device: mps, n_gpu: 1, distributed training: False, compute dtype: None
[INFO|tokenization_utils_base.py:2085] 2024-05-08 23:53:47,844 >> loading file tokenizer.json
[INFO|tokenization_utils_base.py:2085] 2024-05-08 23:53:47,844 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2085] 2024-05-08 23:53:47,844 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2085] 2024-05-08 23:53:47,844 >> loading file tokenizer_config.json
[WARNING|logging.py:314] 2024-05-08 23:53:47,983 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
05/08/2024 23:53:47 - INFO - llmtuner.data.template - Replace eos token: <|eot_id|>
05/08/2024 23:53:47 - INFO - llmtuner.data.template - Add pad token: <|eot_id|>
05/08/2024 23:53:47 - INFO - llmtuner.data.loader - Loading dataset identity.json...
05/08/2024 23:53:47 - WARNING - llmtuner.data.utils - Checksum failed: mismatched SHA-1 hash value at ./data/identity.json.
Converting format of dataset (num_proc=16): 100%|█████████████████████████████████████████████████████████████████████| 91/91 [00:00<00:00, 773.83 examples/s]
Running tokenizer on dataset (num_proc=16): 100%|██████████████████████████████████████████████████████████████████████| 91/91 [00:01<00:00, 64.16 examples/s]
input_ids:
[128000, 128006, 9125, 128007, 271, 2675, 527, 264, 11190, 18328, 13, 128009, 128006, 882, 128007, 271, 6151, 128009, 128006, 78191, 128007, 271, 9906, 0, 358, 1097, 35469, 242, 115007, 15836, 11, 459, 15592, 18328, 8040, 555, 103990, 244, 28308, 94, 6708, 242, 13, 2650, 649, 358, 7945, 499, 3432, 30, 128009]
inputs:
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
You are a helpful assistant.<|eot_id|><|start_header_id|>user<|end_header_id|>
hi<|eot_id|><|start_header_id|>assistant<|end_header_id|>
Hello! I am 兔爷AI, an AI assistant developed by 咖啡兔. How can I assist you today?<|eot_id|>
label_ids:
[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 9906, 0, 358, 1097, 35469, 242, 115007, 15836, 11, 459, 15592, 18328, 8040, 555, 103990, 244, 28308, 94, 6708, 242, 13, 2650, 649, 358, 7945, 499, 3432, 30, 128009]
labels:
Hello! I am 兔爷AI, an AI assistant developed by 咖啡兔. How can I assist you today?<|eot_id|>
[INFO|configuration_utils.py:724] 2024-05-08 23:53:50,931 >> loading configuration file /Users/henryyan/llm/models/Meta-Llama-3-8B-Instruct/config.json
[INFO|configuration_utils.py:789] 2024-05-08 23:53:50,931 >> Model config LlamaConfig {
"_name_or_path": "/Users/henryyan/llm/models/Meta-Llama-3-8B-Instruct",
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 128000,
"eos_token_id": 128001,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 8192,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 500000.0,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.40.1",
"use_cache": true,
"vocab_size": 128256
}
[INFO|modeling_utils.py:3426] 2024-05-08 23:53:50,957 >> loading weights file /Users/henryyan/llm/models/Meta-Llama-3-8B-Instruct/model.safetensors.index.json
[INFO|modeling_utils.py:1494] 2024-05-08 23:53:50,958 >> Instantiating LlamaForCausalLM model under default dtype torch.float32.
[INFO|configuration_utils.py:928] 2024-05-08 23:53:50,958 >> Generate config GenerationConfig {
"bos_token_id": 128000,
"eos_token_id": 128001
}
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:11<00:00, 2.81s/it]
[INFO|modeling_utils.py:4170] 2024-05-08 23:54:02,425 >> All model checkpoint weights were used when initializing LlamaForCausalLM.
[INFO|modeling_utils.py:4178] 2024-05-08 23:54:02,425 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at /Users/henryyan/llm/models/Meta-Llama-3-8B-Instruct.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
[INFO|configuration_utils.py:881] 2024-05-08 23:54:02,428 >> loading configuration file /Users/henryyan/llm/models/Meta-Llama-3-8B-Instruct/generation_config.json
[INFO|configuration_utils.py:928] 2024-05-08 23:54:02,428 >> Generate config GenerationConfig {
"bos_token_id": 128000,
"do_sample": true,
"eos_token_id": [
128001,
128009
],
"max_length": 4096,
"temperature": 0.6,
"top_p": 0.9
}
05/08/2024 23:54:02 - INFO - llmtuner.model.utils.checkpointing - Gradient checkpointing enabled.
05/08/2024 23:54:02 - INFO - llmtuner.model.utils.attention - Using torch SDPA for faster training and inference.
05/08/2024 23:54:02 - INFO - llmtuner.model.adapter - Fine-tuning method: LoRA
05/08/2024 23:54:02 - INFO - llmtuner.model.loader - trainable params: 3407872 || all params: 8033669120 || trainable%: 0.0424
[INFO|trainer.py:441] 2024-05-08 23:54:02,561 >> You have loaded a model on multiple GPUs. `is_model_parallel` attribute will be force-set to `True` to avoid any unexpected behavior such as device placement mismatching.
[INFO|trainer.py:2048] 2024-05-08 23:54:02,646 >> ***** Running training *****
[INFO|trainer.py:2049] 2024-05-08 23:54:02,646 >> Num examples = 81
[INFO|trainer.py:2050] 2024-05-08 23:54:02,646 >> Num Epochs = 5
[INFO|trainer.py:2051] 2024-05-08 23:54:02,646 >> Instantaneous batch size per device = 2
[INFO|trainer.py:2054] 2024-05-08 23:54:02,647 >> Total train batch size (w. parallel, distributed & accumulation) = 16
[INFO|trainer.py:2055] 2024-05-08 23:54:02,647 >> Gradient Accumulation steps = 8
[INFO|trainer.py:2056] 2024-05-08 23:54:02,647 >> Total optimization steps = 25
[INFO|trainer.py:2057] 2024-05-08 23:54:02,647 >> Number of trainable parameters = 3,407,872
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 25/25 [02:39<00:00, 6.41s/it][INFO|trainer.py:2316] 2024-05-08 23:56:41,929 >>
Training completed. Do not forget to share your model on huggingface.co/models =)
{'train_runtime': 159.2823, 'train_samples_per_second': 2.543, 'train_steps_per_second': 0.157, 'train_loss': 9015320.32, 'epoch': 4.88}
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 25/25 [02:39<00:00, 6.37s/it]
[INFO|trainer.py:3305] 2024-05-08 23:56:41,931 >> Saving model checkpoint to ./saves/LLaMA3-8B-KFT2/lora/sft
/opt/homebrew/anaconda3/envs/llama_factory/lib/python3.10/site-packages/peft/utils/save_and_load.py:154: UserWarning: Could not find a config file in /Users/henryyan/llm/models/Meta-Llama-3-8B-Instruct - will assume that the vocabulary was not modified.
warnings.warn(
[INFO|tokenization_utils_base.py:2488] 2024-05-08 23:56:41,972 >> tokenizer config file saved in ./saves/LLaMA3-8B-KFT2/lora/sft/tokenizer_config.json
[INFO|tokenization_utils_base.py:2497] 2024-05-08 23:56:41,972 >> Special tokens file saved in ./saves/LLaMA3-8B-KFT2/lora/sft/special_tokens_map.json
***** train metrics *****
epoch = 4.878
total_flos = 1219747GF
train_loss = 9015320.32
train_runtime = 0:02:39.28
train_samples_per_second = 2.543
train_steps_per_second = 0.157
05/08/2024 23:56:42 - WARNING - llmtuner.extras.ploting - No metric loss to plot.
05/08/2024 23:56:42 - WARNING - llmtuner.extras.ploting - No metric eval_loss to plot.
[INFO|trainer.py:3614] 2024-05-08 23:56:42,050 >> ***** Running Evaluation *****
[INFO|trainer.py:3616] 2024-05-08 23:56:42,050 >> Num examples = 10
[INFO|trainer.py:3619] 2024-05-08 23:56:42,050 >> Batch size = 1
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:01<00:00, 6.05it/s]
***** eval metrics *****
epoch = 4.878
eval_loss = nan
eval_runtime = 0:00:01.81
eval_samples_per_second = 5.508
eval_steps_per_second = 5.508
[INFO|modelcard.py:450] 2024-05-08 23:56:43,866 >> Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}
叠加Lora 运行结果是乱码
llamafactory-cli chat \
--model_name_or_path /Users/henryyan/llm/models/Meta-Llama-3-8B-Instruct \
--adapter_name_or_path saves/LLaMA3-8B-KFT2/lora/sft \
--template llama3 \
--finetuning_type lora
直接运行原版LLama3正常
llamafactory-cli chat \
--model_name_or_path /Users/henryyan/llm/models/Meta-Llama-3-8B-Instruct \
--template llama3
尝试导出再运行
llamafactory-cli export \
--model_name_or_path /Users/henryyan/llm/models/Meta-Llama-3-8B-Instruct \
--adapter_name_or_path saves/LLaMA3-8B-KFT/lora/sft \
--template llama3 \
--finetuning_type lora \
--export_dir /Users/henryyan/llm/models/LLaMA3-8B-KFT \
--export_size 2 \
--export_device cpu \
--export_legacy_format False
llamafactory-cli export \
--model_name_or_path /Users/henryyan/llm/models/Meta-Llama-3-8B-Instruct \
--adapter_name_or_path saves/LLaMA3-8B-KFT2/lora/sft \
--template llama3 \
--finetuning_type lora \
--export_dir /Users/henryyan/llm/models/LLaMA3-8B-KFT2 \
--export_size 2 \
--export_device cpu \
--export_legacy_format False
[INFO|tokenization_utils_base.py:2085] 2024-05-09 00:09:24,591 >> loading file tokenizer.json
[INFO|tokenization_utils_base.py:2085] 2024-05-09 00:09:24,591 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2085] 2024-05-09 00:09:24,591 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2085] 2024-05-09 00:09:24,591 >> loading file tokenizer_config.json
[WARNING|logging.py:314] 2024-05-09 00:09:24,729 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
05/09/2024 00:09:24 - INFO - llmtuner.data.template - Replace eos token: <|eot_id|>
05/09/2024 00:09:24 - INFO - llmtuner.data.template - Add pad token: <|eot_id|>
[INFO|configuration_utils.py:724] 2024-05-09 00:09:24,729 >> loading configuration file /Users/henryyan/llm/models/Meta-Llama-3-8B-Instruct/config.json
[INFO|configuration_utils.py:789] 2024-05-09 00:09:24,730 >> Model config LlamaConfig {
"_name_or_path": "/Users/henryyan/llm/models/Meta-Llama-3-8B-Instruct",
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 128000,
"eos_token_id": 128001,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 8192,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 500000.0,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.40.1",
"use_cache": true,
"vocab_size": 128256
}
05/09/2024 00:09:24 - INFO - llmtuner.model.patcher - Using KV cache for faster generation.
[INFO|modeling_utils.py:3426] 2024-05-09 00:09:24,743 >> loading weights file /Users/henryyan/llm/models/Meta-Llama-3-8B-Instruct/model.safetensors.index.json
[INFO|modeling_utils.py:1494] 2024-05-09 00:09:24,743 >> Instantiating LlamaForCausalLM model under default dtype torch.float32.
[INFO|configuration_utils.py:928] 2024-05-09 00:09:24,743 >> Generate config GenerationConfig {
"bos_token_id": 128000,
"eos_token_id": 128001
}
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:06<00:00, 1.64s/it]
[INFO|modeling_utils.py:4170] 2024-05-09 00:09:31,610 >> All model checkpoint weights were used when initializing LlamaForCausalLM.
[INFO|modeling_utils.py:4178] 2024-05-09 00:09:31,610 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at /Users/henryyan/llm/models/Meta-Llama-3-8B-Instruct.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
[INFO|configuration_utils.py:881] 2024-05-09 00:09:31,611 >> loading configuration file /Users/henryyan/llm/models/Meta-Llama-3-8B-Instruct/generation_config.json
[INFO|configuration_utils.py:928] 2024-05-09 00:09:31,612 >> Generate config GenerationConfig {
"bos_token_id": 128000,
"do_sample": true,
"eos_token_id": [
128001,
128009
],
"max_length": 4096,
"temperature": 0.6,
"top_p": 0.9
}
05/09/2024 00:09:31 - INFO - llmtuner.model.utils.attention - Using torch SDPA for faster training and inference.
05/09/2024 00:09:31 - INFO - llmtuner.model.adapter - Fine-tuning method: LoRA
05/09/2024 00:09:32 - INFO - llmtuner.model.adapter - Merged 1 adapter(s).
05/09/2024 00:09:32 - INFO - llmtuner.model.adapter - Loaded adapter(s): saves/LLaMA3-8B-KFT2/lora/sft
05/09/2024 00:09:32 - INFO - llmtuner.model.loader - all params: 8030261248
[INFO|configuration_utils.py:471] 2024-05-09 00:09:38,169 >> Configuration saved in /Users/henryyan/llm/models/LLaMA3-8B-KFT2/config.json
[INFO|configuration_utils.py:697] 2024-05-09 00:09:38,170 >> Configuration saved in /Users/henryyan/llm/models/LLaMA3-8B-KFT2/generation_config.json
[INFO|modeling_utils.py:2598] 2024-05-09 00:09:44,352 >> The model is bigger than the maximum size per checkpoint (2GB) and is going to be split in 9 checkpoint shards. You can find where each parameters has been saved in the index located at /Users/henryyan/llm/models/LLaMA3-8B-KFT2/model.safetensors.index.json.
[INFO|tokenization_utils_base.py:2488] 2024-05-09 00:09:44,353 >> tokenizer config file saved in /Users/henryyan/llm/models/LLaMA3-8B-KFT2/tokenizer_config.json
[INFO|tokenization_utils_base.py:2497] 2024-05-09 00:09:44,354 >> Special tokens file saved in /Users/henryyan/llm/models/LLaMA3-8B-KFT2/special_tokens_map.json
导出后再运行还是乱码😭
Expected behavior
No response
System Info
- MacOS 14.4.1
- M2 Ultra 128G
- Python 3.10.14
- pip 23.3.1
pip list
Package Version Editable project location
----------------------------- ----------- ---------------------------------
accelerate 0.29.3
aiofiles 23.2.1
aiohttp 3.9.5
aiosignal 1.3.1
altair 5.3.0
annotated-types 0.6.0
anyio 4.3.0
appdirs 1.4.4
async-timeout 4.0.3
attrs 23.2.0
certifi 2024.2.2
charset-normalizer 3.3.2
click 8.1.7
contourpy 1.2.1
cycler 0.12.1
datasets 2.19.0
dill 0.3.8
distro 1.9.0
docstring_parser 0.16
dynaconf 3.2.5
einops 0.7.0
evidently 0.4.19
exceptiongroup 1.2.1
Faker 24.14.0
fastapi 0.110.2
ffmpy 0.3.2
filelock 3.13.4
fire 0.6.0
fonttools 4.51.0
frozenlist 1.4.1
fsspec 2024.3.1
gradio 4.28.3
gradio_client 0.16.0
h11 0.14.0
httpcore 1.0.5
httpx 0.27.0
huggingface-hub 0.23.0
idna 3.7
importlib_resources 6.4.0
iterative-telemetry 0.0.8
Jinja2 3.1.3
joblib 1.4.0
jsonschema 4.21.1
jsonschema-specifications 2023.12.1
kiwisolver 1.4.5
litestar 2.8.2
llmtuner 0.7.1.dev0 /Users/henryyan/llm/LLaMA-Factory
markdown-it-py 3.0.0
MarkupSafe 2.1.5
matplotlib 3.8.4
mdurl 0.1.2
mlx 0.12.2
mpmath 1.3.0
msgspec 0.18.6
multidict 6.0.5
multiprocess 0.70.16
mypy-extensions 1.0.0
networkx 3.3
nltk 3.8.1
numpy 1.26.4
orjson 3.10.1
packaging 24.0
pandas 2.2.2
patsy 0.5.6
peft 0.10.0
pillow 10.3.0
pip 23.3.1
plotly 5.21.0
polyfactory 2.15.0
protobuf 5.26.1
psutil 5.9.8
pyarrow 16.0.0
pyarrow-hotfix 0.6
pydantic 2.7.1
pydantic_core 2.18.2
pydub 0.25.1
Pygments 2.17.2
pyparsing 3.1.2
python-dateutil 2.9.0.post0
python-multipart 0.0.9
pytz 2024.1
PyYAML 6.0.1
referencing 0.35.0
regex 2024.4.16
requests 2.31.0
rich 13.7.1
rich-click 1.7.4
rpds-py 0.18.0
ruff 0.4.2
safetensors 0.4.3
scikit-learn 1.4.2
scipy 1.13.0
semantic-version 2.10.0
sentencepiece 0.2.0
setuptools 68.2.2
shellingham 1.5.4
shtab 1.7.1
six 1.16.0
sniffio 1.3.1
sse-starlette 2.1.0
starlette 0.37.2
statsmodels 0.14.2
sympy 1.12
tenacity 8.2.3
termcolor 2.4.0
threadpoolctl 3.4.0
tiktoken 0.6.0
tokenizers 0.19.1
tomlkit 0.12.0
toolz 0.12.1
torch 2.2.2
torchaudio 2.2.2
torchvision 0.17.2
tqdm 4.66.2
transformers 4.40.1
transformers-stream-generator 0.0.5
trl 0.8.6
typer 0.12.3
typing_extensions 4.11.0
typing-inspect 0.9.0
tyro 0.8.3
tzdata 2024.1
ujson 5.9.0
urllib3 2.2.1
uvicorn 0.29.0
watchdog 4.0.0
websockets 11.0.3
wheel 0.41.2
xxhash 3.4.1
yarl 1.9.4
Others
已经积极尝试各种办法均无效,请求支援。
目前看是训练的时候就有问题,loss异常地大。
可以再试一下其他模型,比如qwen1.5-0.5B ,我这边可以正常训练。对比一下可能是什么环节的问题
目前看是训练的时候就有问题,loss异常地大。
可以再试一下其他模型,比如qwen1.5-0.5B ,我这边可以正常训练。对比一下可能是什么环节的问题
尝试了 Qwen-1.5-0.5B 是可以的,这就有点奇怪了,我尝试一下加大数据集在用 7B 训练。
mac 只能支持单精度的训练,没有办法做混合精度,不确定是不是两个版本的模型的数据格式有区别导致的。
mac 只能支持单精度的训练,没有办法做混合精度,不确定是不是两个版本的模型的数据格式有区别导致的。
有这个可能,我尝试一下单独租一个计算资源用同样的命令训练
提示这个:Can't infer missing attention mask on mps
device. Please provide an attention_mask
or use a different device. 是不是在mac无法做推理