mlx-vlm icon indicating copy to clipboard operation
mlx-vlm copied to clipboard

Negative padding

Open pavelgur opened this issue 1 year ago • 2 comments

So I have this error during training. I had to reduce batch size to 1 to fix it. But I wonder what would be the correct way to resolve this?

INFO:main:Training model {'Epoch': 0, 'Step': 0, 'Loss': '10.9777'} {'Epoch': 0, 'Step': 10, 'Loss': '5.8438'} 0%| | 15/8317 [00:22<3:31:28, 1.53s/it, Epoch=0, Step=14, Loss=4.8478] Traceback (most recent call last): File "", line 198, in _run_module_as_main File "", line 88, in _run_code File "mlx_vlm/lora.py", line 178, in main(args) File "mlx_vlm/lora.py", line 98, in main loss = trainer.train_step( ^^^^^^^^^^^^^^^^^^^ File "mlx_vlm/trainer/trainer.py", line 265, in train_step loss, grads = loss_and_grad_fn(self.model, batch) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "mlx/nn/utils.py", line 35, in wrapped_value_grad_fn value, grad = value_grad_fn(model.trainable_parameters(), *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "mlx/nn/utils.py", line 29, in inner_fn return fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^ File "mlx_vlm/trainer/trainer.py", line 230, in loss_fn outputs = model(input_ids, pixel_values, attention_mask, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "mlx_vlm/models/qwen2_vl/qwen2_vl.py", line 116, in call input_embddings = self.get_input_embeddings( ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "mlx_vlm/models/qwen2_vl/qwen2_vl.py", line 78, in get_input_embeddings final_inputs_embeds = self._merge_input_ids_with_image_features( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "mlx_vlm/models/qwen2_vl/qwen2_vl.py", line 96, in _merge_input_ids_with_image_features image_features = mx.pad(image_features, ((0, 0), (0, pad_size), (0, 0))) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ValueError: Invalid high padding size (-60) passed to pad for axis 1. Padding sizes must be non-negative

pavelgur avatar Feb 23 '25 17:02 pavelgur

Hey @pavelgur

Could you share a reproducible script, dataset and model?

The trainer is definetly due to a overhaul. The initial version has some limitation in batch size and multi-image for certain models

Blaizzy avatar Feb 24 '25 11:02 Blaizzy

I changed the logic to a non-padding, please check it out and let me know if it works for you #227

Blaizzy avatar Mar 05 '25 21:03 Blaizzy