EasyLM
EasyLM copied to clipboard
why 'LLaMATokenizer' object has no attribute 'sp_model'?
when i run command like this
python -m EasyLM.models.llama.llama_train \
--total_steps=10 \
--save_model_freq=10 \
--optimizer.adamw_optimizer.lr_warmup_steps=1 \
--train_dataset.json_dataset.path='/home/ec2-user/workplace/EasyLM/dataset/' \
--train_dataset.json_dataset.seq_length=1024 \
--load_checkpoint='params::/home/ec2-user/workplace/EasyLM/open_llama_7b_v2_easylm' \
--tokenizer.vocab_file='/home/ec2-user/workplace/EasyLM/open_llama_7b_v2_easylm/tokenizer.model' \
--logger.output_dir=checkpoint/ \
--mesh_dim='1,4,2' \
--load_llama_config='7b' \
--train_dataset.type='json' \
--train_dataset.text_processor.fields='text' \
--optimizer.type='adamw' \
--optimizer.accumulate_gradient_steps=1 \
--optimizer.adamw_optimizer.lr=0.002 \
--optimizer.adamw_optimizer.end_lr=0.002 \
--optimizer.adamw_optimizer.lr_decay_steps=100000000 \
--optimizer.adamw_optimizer.weight_decay=0.001 \
--optimizer.adamw_optimizer.multiply_by_parameter_scale=True \
--optimizer.adamw_optimizer.bf16_momentum=True
logs as below:
wandb: Tracking run with wandb version 0.15.12
wandb: W&B syncing is set to `offline` in this directory.
wandb: Run `wandb online` or set WANDB_MODE=online to enable cloud syncing.
2023-10-24 13:53:33.481541: W external/xla/xla/service/gpu/nvptx_compiler.cc:673] The NVIDIA driver's CUDA version is 12.0 which is older than the ptxas CUDA version (12.3.52). Because the driver is older than the ptxas version, XLA is disabling parallel compilation, which may slow down compilation. You should update your NVIDIA driver or use the NVIDIA-provided CUDA forward compatibility packages.
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/home/ec2-user/workplace/EasyLM/EasyLM/models/llama/llama_train.py", line 267, in <module>
mlxu.run(main)
File "/home/ec2-user/miniconda3/lib/python3.11/site-packages/absl/app.py", line 308, in run
_run_main(main, args)
File "/home/ec2-user/miniconda3/lib/python3.11/site-packages/absl/app.py", line 254, in _run_main
sys.exit(main(argv))
^^^^^^^^^^
File "/home/ec2-user/workplace/EasyLM/EasyLM/models/llama/llama_train.py", line 64, in main
tokenizer = LLaMAConfig.get_tokenizer(FLAGS.tokenizer)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ec2-user/workplace/EasyLM/EasyLM/models/llama/llama_model.py", line 293, in get_tokenizer
tokenizer = LLaMATokenizer(
^^^^^^^^^^^^^^^
File "/home/ec2-user/workplace/EasyLM/EasyLM/models/llama/llama_model.py", line 1140, in __init__
super().__init__(bos_token=bos_token, eos_token=eos_token, unk_token=unk_token, **kwargs)
File "/home/ec2-user/miniconda3/lib/python3.11/site-packages/transformers/tokenization_utils.py", line 366, in __init__
self._add_tokens(self.all_special_tokens_extended, special_tokens=True)
File "/home/ec2-user/miniconda3/lib/python3.11/site-packages/transformers/tokenization_utils.py", line 462, in _add_tokens
current_vocab = self.get_vocab().copy()
^^^^^^^^^^^^^^^^
File "/home/ec2-user/workplace/EasyLM/EasyLM/models/llama/llama_model.py", line 1175, in get_vocab
vocab = {self.convert_ids_to_tokens(i): i for i in range(self.vocab_size)}
^^^^^^^^^^^^^^^
File "/home/ec2-user/workplace/EasyLM/EasyLM/models/llama/llama_model.py", line 1163, in vocab_size
return self.sp_model.get_piece_size()
^^^^^^^^^^^^^
AttributeError: 'LLaMATokenizer' object has no attribute 'sp_model'
wandb: Waiting for W&B process to finish... (failed 1).
wandb: You can sync this run to the cloud by running:
wandb: wandb sync checkpoint/27fc482119cd4211965c651f185f0aa6/wandb/offline-run-20231024_135326-27fc482119cd4211965c651f185f0aa6
wandb: Find logs at: checkpoint/27fc482119cd4211965c651f185f0aa6/wandb/offline-run-20231024_135326-27fc482119cd4211965c651f185f0aa6/logs
it' s seem tokenizer.vocab_file is incorrect,but i don't know which file should be use.
No one answer ?
I fixed the problem by downgrading transformers to 4.33.0 (pip install -U transformers==4.33.0
)
It worked to me! Thank you @juliensalinas