LLaMA-Factory
LLaMA-Factory copied to clipboard
Add Multimodal LLM Finetuning
What does this PR do?
Add finetuning Multimodal-LLM especially for LLaVA by leveraging AutoModelForVision2Seq and AutoProcessortransformers
This PR is working in progress, needs improvement in the future, e.g. other MLLM.
Support Models
- [x] LLaVA-1.5
Make your own Instruct Dataset
Just organize the content like the data/llava_instruct_example.json.
Train and Test
Training
# train
bash examples/mllm/sft_llava.sh
Test SFT model
python scripts/test_mllm.py \
--base_model_path llava-hf/llava-1.5-7b-hf \
--lora_model_path saves/llava-1.5-7b/lora/sft \
--model_path saves/llava-1.5-7b/lora/merged \
--dataset_name data/llava_instruct_example.json \
--do_merge
Test original model
python scripts/test_mllm.py \
--model_path llava-hf/llava-1.5-7b-hf \
--dataset_name data/llava_instruct_example.json
Before submitting
- [x] Did you read the contributor guideline?