mlx-vlm
mlx-vlm copied to clipboard
Error in FineTuning deepseek-vl-7b-chat-8bit
This is the command I'm using:
python -m mlx_vlm.lora --dataset ~/Datasets/BusinessVQA/fintabnet/val/vqa_dataset.hf --model-path ~/.cache/lm-studio/models/mlx-community/deepseek-vl-7b-chat-8bit --epochs 2 --batch-size 4 --learning-rate 5e-5
Here is the console output:
INFO:__main__:Loading model from /Users/sachinraja/.cache/lm-studio/models/mlx-community/deepseek-vl-7b-chat-8bit
INFO:__main__:Loading dataset from /Users/sachinraja/Datasets/BusinessVQA/fintabnet/val/vqa_dataset.hf
INFO:__main__:Applying chat template to the dataset
Map: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 240574/240574 [00:07<00:00, 31075.60 examples/s]
INFO:__main__:Setting up LoRA
#trainable params: 23.424 M || all params: 6910.365696 M || trainable%: 0.339%
INFO:__main__:Setting up optimizer
INFO:__main__:Setting up trainer
INFO:__main__:Training model
0%| | 0/60143 [00:10<?, ?it/s]
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/Users/sachinraja/Code/mlx-vlm/mlx_vlm/lora.py", line 177, in <module>
main(args)
File "/Users/sachinraja/Code/mlx-vlm/mlx_vlm/lora.py", line 97, in main
loss = trainer.train_step(
^^^^^^^^^^^^^^^^^^^
File "/Users/sachinraja/Code/mlx-vlm/mlx_vlm/trainer/trainer.py", line 265, in train_step
loss, grads = loss_and_grad_fn(self.model, batch)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Caskroom/miniforge/base/envs/mlx/lib/python3.11/site-packages/mlx/nn/utils.py", line 35, in wrapped_value_grad_fn
value, grad = value_grad_fn(model.trainable_parameters(), *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Caskroom/miniforge/base/envs/mlx/lib/python3.11/site-packages/mlx/nn/utils.py", line 29, in inner_fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/Users/sachinraja/Code/mlx-vlm/mlx_vlm/trainer/trainer.py", line 251, in loss_fn
nn.losses.cross_entropy(
File "/opt/homebrew/Caskroom/miniforge/base/envs/mlx/lib/python3.11/site-packages/mlx/nn/losses.py", line 81, in cross_entropy
raise ValueError(
ValueError: Targets shape (4, 78) does not match logits shape (1, 78, 102400).
@Blaizzy : Will greatly appreciate your help here please.
Hey @sachinraja13
Please set batch size to 1.
There is a bug with batch size bigger than one for some models.
Thank you!
This is being fixed in #499