VILA
VILA copied to clipboard
Fine tuning and --evaluation_strategy argument
I'm trying to get fine-tuning working through the 3_sft.sh script but am encountering an error:
Traceback (most recent call last):
File "/root/VILA/llava/train/train_mem.py", line 36, in <module>
train()
File "/root/VILA/llava/train/train.py", line 436, in train
trainer.train(resume_from_checkpoint=resume_from_checkpoint)
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1537, in train
return inner_training_loop(
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1854, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2738, in training_step
loss = self.compute_loss(model, inputs)
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2761, in compute_loss
outputs = model(**inputs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/engine.py", line 1735, in forward
Traceback (most recent call last):
loss = self.module(*inputs, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1538, in _call_impl
result = forward_call(*args, **kwargs)
File "/root/VILA/llava/model/language_model/llava_llama.py", line 133, in forward
outputs = self.llm.forward(
TypeError: LlamaForCausalLM.forward() got an unexpected keyword argument 'seqlens_in_batch'
I tried commenting out the seqlens_in_batch argument where self.llm.forward() is called and the script will work, but when i try to get the validation scores by setting --evaluation_strategy to something other than "no" I get a bunch of errors related to the dataloader and the dataset 'inputs':
Traceback (most recent call last):
File "/root/VILA/llava/train/train_mem.py", line 36, in <module>
train()
File "/root/VILA/llava/train/train.py", line 436, in train
trainer.train(resume_from_checkpoint=resume_from_checkpoint)
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1537, in train
return inner_training_loop(
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1929, in _inner_training_loop
self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2262, in _maybe_log_save_evaluate
dataset_metrics = self.evaluate(
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 3022, in evaluate
output = eval_loop(
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 3212, in evaluation_loop
loss, logits, labels = self.prediction_step(model, inputs, prediction_loss_only, ignore_keys=ignore_keys)
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 3429, in prediction_step
loss, outputs = self.compute_loss(model, inputs, return_outputs=True)
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2761, in compute_loss
outputs = model(**inputs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1538, in _call_impl
result = forward_call(*args, **kwargs)
File "/root/VILA/llava/model/language_model/llava_llama.py", line 102, in forward
) = self.prepare_inputs_labels_for_multimodal(
File "/root/VILA/llava/model/llava_arch.py", line 261, in prepare_inputs_labels_for_multimodal
if vision_tower is None or images is None or input_ids.shape[1] == 1:
IndexError: tuple index out of range
Any suggestions?
TypeError: LlamaForCausalLM.forward() got an unexpected keyword argument 'seqlens_in_batch'
This error is usually caused by in-complete environement install. Please follow the instruction in environment_setup.sh to set up