llama3-llava-next-8b 全参微调报错
Reminder
- [x] I have read the above rules and searched the existing issues.
System Info
llamafactory 已经更新至最新版本
llamafactoryversion: 0.9.2.dev0- Platform: Linux-5.10.0-1.0.0.32-x86_64-with-glibc2.17
- Python version: 3.9.18
- PyTorch version: 2.1.0+cu118 (GPU)
- Transformers version: 4.45.0
- Datasets version: 2.19.2
- Accelerate version: 0.34.2
- PEFT version: 0.12.0
- TRL version: 0.9.6
- GPU type: NVIDIA A100-SXM4-40GB
- GPU number: 8
- GPU memory: 39.39GB
- DeepSpeed version: 0.15.4
Reproduction
全参数微调过程报错。 一共有2个报错,
第一个是image_sizes = iter(mm_inputs["image_sizes"].tolist()) 这里的mm_inputs 中没有image_sizes,image_sizes在后面的orig_height, orig_width = next(image_sizes) 会用到 第二个是 height, width = get_image_size(to_numpy_array(mm_inputs["pixel_values"][0][0])) 中的 mm_inputs["pixel_values"][0][0] 是不是应该是mm_inputs["pixel_values"][0]??
raceback (most recent call last):
File "/home/users/code/LLaMA-Factory/src/llamafactory/launcher.py", line 23, in <module>
launch()
File "/home/users/code/LLaMA-Factory/src/llamafactory/launcher.py", line 19, in launch
run_exp()
File "/home/users/code/LLaMA-Factory/src/llamafactory/train/tuner.py", line 93, in run_exp
_training_function(config={"args": args, "callbacks": callbacks})
File "/home/users/code/LLaMA-Factory/src/llamafactory/train/tuner.py", line 67, in _training_function
run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
File "/home/users/code/LLaMA-Factory/src/llamafactory/train/sft/workflow.py", line 51, in run_sft
dataset_module = get_dataset(template, model_args, data_args, training_args, stage="sft", **tokenizer_module)
File "/home/users/code/LLaMA-Factory/src/llamafactory/data/loader.py", line 325, in get_dataset
dataset = _get_preprocessed_dataset(
File "/home/users/code/LLaMA-Factory/src/llamafactory/data/loader.py", line 258, in _get_preprocessed_dataset
dataset = dataset.map(
File "/home/users/miniconda3/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 602, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/home/users/miniconda3/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 567, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/home/users/miniconda3/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 3156, in map
for rank, done, content in Dataset._map_single(**dataset_kwargs):
File "/home/users/miniconda3/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 3547, in _map_single
batch = apply_function_on_filtered_inputs(
File "/home/users/miniconda3/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 3416, in apply_function_on_filtered_inputs
processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
File "/home/users/code/LLaMA-Factory/src/llamafactory/data/processor/supervised.py", line 99, in preprocess_dataset
input_ids, labels = self._encode_data_example(
File "/home/users/code/LLaMA-Factory/src/llamafactory/data/processor/supervised.py", line 43, in _encode_data_example
messages = self.template.mm_plugin.process_messages(prompt + response, images, videos, audios, self.processor)
File "/home/users/code/LLaMA-Factory/src/llamafactory/data/mm_plugin.py", line 397, in process_messages
height, width = get_image_size(to_numpy_array(mm_inputs["pixel_values"][0][0]))
File "/home/users/miniconda3/lib/python3.9/site-packages/transformers/image_utils.py", line 295, in get_image_size
channel_dim = infer_channel_dimension_format(image)
File "/home/users/miniconda3/lib/python3.9/site-packages/transformers/image_utils.py", line 243, in infer_channel_dimension_format
raise ValueError(f"Unsupported number of image dimensions: {image.ndim}")
ValueError: Unsupported number of image dimensions: 2
[2025-02-27 19:49:42,031] torch.distributed.elastic.mu
对应代码部分
class LlavaNextPlugin(BasePlugin):
@override
def process_messages(
self,
messages: Sequence[Dict[str, str]],
images: Sequence["ImageInput"],
videos: Sequence["VideoInput"],
audios: Sequence["AudioInput"],
processor: Optional["ProcessorMixin"],
) -> List[Dict[str, str]]:
self._validate_input(processor, images, videos, audios)
num_image_tokens = 0
messages = deepcopy(messages)
mm_inputs = self._get_mm_inputs(images, videos, audios, processor)
if "pixel_values" in mm_inputs:
image_sizes = iter(mm_inputs["image_sizes"].tolist()) <-----------------mm_inputs 中 image_sizes 应该不存在,所以这一行有报错
height, width = get_image_size(to_numpy_array(mm_inputs["pixel_values"][0][0]))<---------------- 这是是不是应该是mm_inputs["pixel_values"][0] 而不是 mm_inputs["pixel_values"][0][0] 否则有上面的报错 ValueError: Unsupported number of image dimensions: 2
### Others
_No response_
应该是存在的https://github.com/huggingface/transformers/blob/v4.49.0/src/transformers/models/llava_next/processing_llava_next.py#L152-L153
报错的不是 transformers/models/llava_next/processing_llava_next.py 而是llamafactory 的代码
而是这 src/llamafactory/data/mm_plugin.py", line 396,
Traceback (most recent call last):
File "/home/users/code/LLaMA-Factory/src/llamafactory/launcher.py", line 23, in <module>
launch()
File "/home/users/code/LLaMA-Factory/src/llamafactory/launcher.py", line 19, in launch
run_exp()
File "/home/users/code/LLaMA-Factory/src/llamafactory/train/tuner.py", line 93, in run_exp
_training_function(config={"args": args, "callbacks": callbacks})
File "/home/users/code/LLaMA-Factory/src/llamafactory/train/tuner.py", line 67, in _training_function
run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
File "/home/users/code/LLaMA-Factory/src/llamafactory/train/sft/workflow.py", line 51, in run_sft
dataset_module = get_dataset(template, model_args, data_args, training_args, stage="sft", **tokenizer_module)
File "/home/users/code/LLaMA-Factory/src/llamafactory/data/loader.py", line 325, in get_dataset
dataset = _get_preprocessed_dataset(
File "/home/users/code/LLaMA-Factory/src/llamafactory/data/loader.py", line 258, in _get_preprocessed_dataset
dataset = dataset.map(
File "/home/users/miniconda3/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 602, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/home/users/miniconda3/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 567, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/home/users/miniconda3/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 3156, in map
for rank, done, content in Dataset._map_single(**dataset_kwargs):
File "/home/users/miniconda3/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 3547, in _map_single
batch = apply_function_on_filtered_inputs(
File "/home/users/miniconda3/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 3416, in apply_function_on_filtered_inputs
processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
File "/home/users/code/LLaMA-Factory/src/llamafactory/data/processor/supervised.py", line 99, in preprocess_dataset
input_ids, labels = self._encode_data_example(
File "/home/users/code/LLaMA-Factory/src/llamafactory/data/processor/supervised.py", line 43, in _encode_data_example
messages = self.template.mm_plugin.process_messages(prompt + response, images, videos, audios, self.processor)
File "/home/users/code/LLaMA-Factory/src/llamafactory/data/mm_plugin.py", line 396, in process_messages
image_sizes = iter(mm_inputs["image_sizes"].tolist())
KeyError: 'image_sizes'
https://github.com/huggingface/transformers/blob/51083d1bac7905aa8316b75f7897bdd4e5302044/src/transformers/models/llava_next/image_processing_llava_next.py#L726C9-L728C10
return BatchFeature(
data={"pixel_values": processed_images, "image_sizes": image_sizes}, tensor_type=return_tensors
)
经过了llava-next的image_processor之后应该会存在这个image_sizes key的,图片输入正确吗
I'm facing the same issue as you. How did you solve it? My training data runs normally when using llama-factory to train qwen2.5vl, and I'm not sure where the problem exactly lies.