ms-swift icon indicating copy to clipboard operation
ms-swift copied to clipboard

自定义数据微调MiniCPM-Llama3-V-2_5报错

Open zhudongwork opened this issue 8 months ago • 4 comments

Describe the bug What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图) Train: 0%| | 0/6 [00:00<?, ?it/s]Traceback (most recent call last): File "/node6/docker-envs/zhudong/vlm_work/swift/swift/cli/sft.py", line 5, in sft_main() File "/node6/docker-envs/zhudong/vlm_work/swift/swift/utils/run_utils.py", line 27, in x_main result = llm_x(args, **kwargs) File "/node6/docker-envs/zhudong/vlm_work/swift/swift/llm/sft.py", line 298, in llm_sft trainer.train(training_args.resume_from_checkpoint) File "/node6/docker-envs/zhudong/vlm_work/swift/swift/trainers/trainers.py", line 50, in train res = super().train(*args, **kwargs) File "/mnt/anaconda3/envs/modelscope/lib/python3.8/site-packages/transformers/trainer.py", line 1859, in train return inner_training_loop( File "/mnt/anaconda3/envs/modelscope/lib/python3.8/site-packages/transformers/trainer.py", line 2165, in _inner_training_loop for step, inputs in enumerate(epoch_iterator): File "/mnt/anaconda3/envs/modelscope/lib/python3.8/site-packages/accelerate/data_loader.py", line 464, in iter next_batch = next(dataloader_iter) File "/mnt/anaconda3/envs/modelscope/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 631, in next data = self._next_data() File "/mnt/anaconda3/envs/modelscope/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 675, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "/mnt/anaconda3/envs/modelscope/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 54, in fetch return self.collate_fn(data) File "/node6/docker-envs/zhudong/vlm_work/swift/swift/llm/utils/template.py", line 408, in data_collator labels = [torch.tensor(b['labels']) for b in batch] File "/node6/docker-envs/zhudong/vlm_work/swift/swift/llm/utils/template.py", line 408, in labels = [torch.tensor(b['labels']) for b in batch] RuntimeError: Could not infer dtype of NoneType Train: 0%| | 0/6 [00:01<?, ?it/s]

image

image

Your hardware and system info Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息,如CUDA版本,系统,GPU型号和torch版本等)

Additional context Add any other context about the problem here(在这里补充其他信息)

zhudongwork avatar May 30 '24 13:05 zhudongwork