MiniGPT-4 icon indicating copy to clipboard operation
MiniGPT-4 copied to clipboard

Asking to pad but the tokenizer does not have a padding token

Open yuanlisky opened this issue 1 year ago • 0 comments

Second finetuning stage

CUDA_VISIBLE_DEVICES=2 python3 train.py --cfg-path train_configs/minigpt4_stage2_finetune.yaml

error

Traceback (most recent call last):
  File "/home/ocr/projects/llm/MiniGPT-4/MiniGPT-4/train.py", line 104, in <module>
    main()
  File "/home/ocr/projects/llm/MiniGPT-4/MiniGPT-4/train.py", line 100, in main
    runner.train()
  File "/home/ocr/projects/llm/MiniGPT-4/MiniGPT-4/minigpt4/runners/runner_base.py", line 378, in train
    train_stats = self.train_epoch(cur_epoch)
  File "/home/ocr/projects/llm/MiniGPT-4/MiniGPT-4/minigpt4/runners/runner_base.py", line 438, in train_epoch
    return self.task.train_epoch(
  File "/home/ocr/projects/llm/MiniGPT-4/MiniGPT-4/minigpt4/tasks/base_task.py", line 114, in train_epoch
    return self._train_inner_loop(
  File "/home/ocr/projects/llm/MiniGPT-4/MiniGPT-4/minigpt4/tasks/base_task.py", line 219, in _train_inner_loop
    loss = self.train_step(model=model, samples=samples)
  File "/home/ocr/projects/llm/MiniGPT-4/MiniGPT-4/minigpt4/tasks/base_task.py", line 68, in train_step
    loss = model(samples)["loss"]
  File "/home/ocr/anaconda3/envs/minigpt4/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ocr/projects/llm/MiniGPT-4/MiniGPT-4/minigpt4/models/mini_gpt4.py", line 181, in forward
    to_regress_tokens = self.llama_tokenizer(
  File "/home/ocr/anaconda3/envs/minigpt4/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2538, in __call__
    encodings = self._call_one(text=text, text_pair=text_pair, **all_kwargs)
  File "/home/ocr/anaconda3/envs/minigpt4/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2624, in _call_one
    return self.batch_encode_plus(
  File "/home/ocr/anaconda3/envs/minigpt4/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2806, in batch_encode_plus
    padding_strategy, truncation_strategy, max_length, kwargs = self._get_padding_truncation_strategies(
  File "/home/ocr/anaconda3/envs/minigpt4/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2443, in _get_padding_truncation_strategies
    raise ValueError(
ValueError: Asking to pad but the tokenizer does not have a padding token. Please select a token to use as `pad_token` `(tokenizer.pad_token = tokenizer.eos_token e.g.)` or add a new pad token via `tokenizer.add_special_tokens({'pad_token': '[PAD]'})`.

NVIDIA-SMI 515.105.01 Driver Version: 515.105.01 CUDA Version: 11.7
GPU A100 80G

yuanlisky avatar May 19 '23 10:05 yuanlisky