LLaMA-Factory
LLaMA-Factory copied to clipboard
bug on google colabs code
Reminder
- [X] I have read the README and searched the existing issues.
Reproduction
Fine-tune model via Command Line I didnt do anything change , just tried as it is
/content/LLaMA-Factory
2024-05-18 12:35:28.953301: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-05-18 12:35:28.953353: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-05-18 12:35:28.954649: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-05-18 12:35:30.175448: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
05/18/2024 12:35:33 - WARNING - llamafactory.hparams.parser - We recommend enable `upcast_layernorm` in quantized training.
05/18/2024 12:35:33 - INFO - llamafactory.hparams.parser - Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: False, compute dtype: torch.float16
/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
tokenizer_config.json: 100% 51.1k/51.1k [00:00<00:00, 66.3MB/s]
tokenizer.json: 100% 9.09M/9.09M [00:01<00:00, 6.23MB/s]
special_tokens_map.json: 100% 459/459 [00:00<00:00, 2.69MB/s]
[INFO|tokenization_utils_base.py:2087] 2024-05-18 12:35:37,198 >> loading file tokenizer.json from cache at /root/.cache/huggingface/hub/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/2950abc9d0b34ddd43fd52bbf0d7dca82807ce96/tokenizer.json
[INFO|tokenization_utils_base.py:2087] 2024-05-18 12:35:37,198 >> loading file added_tokens.json from cache at None
[INFO|tokenization_utils_base.py:2087] 2024-05-18 12:35:37,198 >> loading file special_tokens_map.json from cache at /root/.cache/huggingface/hub/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/2950abc9d0b34ddd43fd52bbf0d7dca82807ce96/special_tokens_map.json
[INFO|tokenization_utils_base.py:2087] 2024-05-18 12:35:37,198 >> loading file tokenizer_config.json from cache at /root/.cache/huggingface/hub/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/2950abc9d0b34ddd43fd52bbf0d7dca82807ce96/tokenizer_config.json
[WARNING|logging.py:314] 2024-05-18 12:35:37,601 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
05/18/2024 12:35:37 - INFO - llamafactory.data.template - Replace eos token: <|eot_id|>
05/18/2024 12:35:37 - INFO - llamafactory.data.loader - Loading dataset identity.json...
Generating train split: 91 examples [00:00, 5377.99 examples/s]
Converting format of dataset: 100% 91/91 [00:00<00:00, 7138.91 examples/s]
05/18/2024 12:35:38 - INFO - llamafactory.data.loader - Loading dataset llamafactory/alpaca_gpt4_en...
Downloading readme: 100% 373/373 [00:00<00:00, 2.43MB/s]
Downloading data: 100% 43.3M/43.3M [00:00<00:00, 51.9MB/s]
Generating train split: 51983 examples [00:01, 32182.19 examples/s]
Converting format of dataset: 100% 500/500 [00:00<00:00, 42058.28 examples/s]
Running tokenizer on dataset: 100% 591/591 [00:00<00:00, 1637.02 examples/s]
input_ids:
[128000, 128006, 9125, 128007, 271, 2675, 527, 264, 11190, 18328, 13, 128009, 128006, 882, 128007, 271, 6151, 128009, 128006, 78191, 128007, 271, 9906, 0, 358, 1097, 445, 81101, 12, 18, 11, 459, 15592, 18328, 8040, 555, 445, 8921, 4940, 17367, 13, 2650, 649, 358, 7945, 499, 3432, 30, 128009]
inputs:
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
You are a helpful assistant.<|eot_id|><|start_header_id|>user<|end_header_id|>
hi<|eot_id|><|start_header_id|>assistant<|end_header_id|>
Hello! I am Llama-3, an AI assistant developed by LLaMA Factory. How can I assist you today?<|eot_id|>
label_ids:
[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 9906, 0, 358, 1097, 445, 81101, 12, 18, 11, 459, 15592, 18328, 8040, 555, 445, 8921, 4940, 17367, 13, 2650, 649, 358, 7945, 499, 3432, 30, 128009]
labels:
Hello! I am Llama-3, an AI assistant developed by LLaMA Factory. How can I assist you today?<|eot_id|>
config.json: 100% 1.15k/1.15k [00:00<00:00, 6.70MB/s]
[INFO|configuration_utils.py:726] 2024-05-18 12:35:49,628 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/2950abc9d0b34ddd43fd52bbf0d7dca82807ce96/config.json
[INFO|configuration_utils.py:789] 2024-05-18 12:35:49,629 >> Model config LlamaConfig {
"_name_or_path": "unsloth/llama-3-8b-Instruct-bnb-4bit",
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 128000,
"eos_token_id": 128009,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 8192,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"pretraining_tp": 1,
"quantization_config": {
"_load_in_4bit": true,
"_load_in_8bit": false,
"bnb_4bit_compute_dtype": "bfloat16",
"bnb_4bit_quant_type": "nf4",
"bnb_4bit_use_double_quant": true,
"llm_int8_enable_fp32_cpu_offload": false,
"llm_int8_has_fp16_weight": false,
"llm_int8_skip_modules": null,
"llm_int8_threshold": 6.0,
"load_in_4bit": true,
"load_in_8bit": false,
"quant_method": "bitsandbytes"
},
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 500000.0,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.40.2",
"use_cache": true,
"vocab_size": 128256
}
05/18/2024 12:35:49 - INFO - llamafactory.model.utils.quantization - Loading ?-bit BITSANDBYTES-quantized model.
🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
Traceback (most recent call last):
File "/usr/local/bin/llamafactory-cli", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/llamafactory/cli.py", line 65, in main
run_exp()
File "/usr/local/lib/python3.10/dist-packages/llamafactory/train/tuner.py", line 34, in run_exp
run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
File "/usr/local/lib/python3.10/dist-packages/llamafactory/train/sft/workflow.py", line 34, in run_sft
model = load_model(tokenizer, model_args, finetuning_args, training_args.do_train)
File "/usr/local/lib/python3.10/dist-packages/llamafactory/model/loader.py", line 124, in load_model
model = load_unsloth_pretrained_model(config, model_args)
File "/usr/local/lib/python3.10/dist-packages/llamafactory/model/utils/unsloth.py", line 39, in load_unsloth_pretrained_model
from unsloth import FastLanguageModel
File "/usr/local/lib/python3.10/dist-packages/unsloth/__init__.py", line 113, in <module>
from .models import *
File "/usr/local/lib/python3.10/dist-packages/unsloth/models/__init__.py", line 15, in <module>
from .loader import FastLanguageModel
File "/usr/local/lib/python3.10/dist-packages/unsloth/models/loader.py", line 15, in <module>
from .llama import FastLlamaModel, logger
File "/usr/local/lib/python3.10/dist-packages/unsloth/models/llama.py", line 27, in <module>
from ._utils import *
File "/usr/local/lib/python3.10/dist-packages/unsloth/models/_utils.py", line 60, in <module>
import xformers.ops.fmha as xformers
ModuleNotFoundError: No module named 'xformers'
Expected behavior
No response
System Info
No response
Others
No response