Llama3.1-Finetuning icon indicating copy to clipboard operation
Llama3.1-Finetuning copied to clipboard

assert len(input_id) == len(target) AssertionError

Open CarlChang39 opened this issue 9 months ago • 9 comments

您好,我在执行qlora微调复现时遇到这个问题,报错信息是: Traceback (most recent call last): File "../finetune_llama3.py", line 452, in train() File "../finetune_llama3.py", line 445, in train trainer.train() File "/home/nlpir/miniconda3/envs/cjy_llama/lib/python3.8/site-packages/transformers/trainer.py", line 1624, in train return inner_training_loop( File "/home/nlpir/miniconda3/envs/cjy_llama/lib/python3.8/site-packages/transformers/trainer.py", line 1928, in _inner_training_loop for step, inputs in enumerate(epoch_iterator): File "/home/nlpir/miniconda3/envs/cjy_llama/lib/python3.8/site-packages/accelerate/data_loader.py", line 452, in iter current_batch = next(dataloader_iter) File "/home/nlpir/miniconda3/envs/cjy_llama/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 633, in next data = self._next_data() File "/home/nlpir/miniconda3/envs/cjy_llama/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 677, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "/home/nlpir/miniconda3/envs/cjy_llama/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/nlpir/miniconda3/envs/cjy_llama/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 51, in data = [self.dataset[idx] for idx in possibly_batched_index] File "../finetune_llama3.py", line 255, in getitem ret = preprocess([self.raw_data[i]["conversations"]], self.tokenizer, self.max_len) File "../finetune_llama3.py", line 192, in preprocess assert len(input_id) == len(target) AssertionError 实际情况是input_id一直比target长度大1. 我在6块1080ti运行的,shell脚本内容如下: NCCL_P2P_DISABLE=1
NCCL_IB_DISABLE=1
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5
torchrun
--nproc_per_node 6
--nnodes 1
--node_rank 0
--master_addr localhost
--master_port 6601
../finetune_llama3.py
--model_name_or_path "../model_hub/LLM-Research/Meta-Llama-3-8B-Instruct/"
--data_path "../data/Belle_sampled_qwen.json"
--fp16 True
--output_dir "../output/llama3_8B_qlora"
--num_train_epochs 100
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--gradient_accumulation_steps 16
--evaluation_strategy "no"
--save_strategy "steps"
--save_steps 5
--save_total_limit 1
--learning_rate 1e-5
--weight_decay 0.1
--adam_beta2 0.95
--warmup_ratio 0.01
--lr_scheduler_type "cosine"
--logging_steps 1
--report_to "none"
--model_max_length 4096
--gradient_checkpointing True
--lazy_preprocess True
--deepspeed "../config/ds_config_zero2.json"
--use_lora
--load_in_4bit
--q_lora 只是改了下CUDA_VISIBLE_DEVICES和nproc_per_node ,并且把bf16改为fp16.

CarlChang39 avatar May 15 '24 11:05 CarlChang39