LLaMA-Factory icon indicating copy to clipboard operation
LLaMA-Factory copied to clipboard

Add Multimodal LLM Finetuning

Open BUAADreamer opened this issue 10 months ago • 0 comments

What does this PR do?

Add finetuning Multimodal-LLM especially for LLaVA by leveraging AutoModelForVision2Seq and AutoProcessortransformers

This PR is working in progress, needs improvement in the future, e.g. other MLLM.

Support Models

  • [x] LLaVA-1.5

Make your own Instruct Dataset

Just organize the content like the data/llava_instruct_example.json.

Train and Test

Training

# train
bash examples/mllm/sft_llava.sh

Test SFT model

python scripts/test_mllm.py \
--base_model_path llava-hf/llava-1.5-7b-hf \
--lora_model_path saves/llava-1.5-7b/lora/sft \
--model_path saves/llava-1.5-7b/lora/merged \
--dataset_name data/llava_instruct_example.json \
--do_merge

Test original model

python scripts/test_mllm.py \
--model_path llava-hf/llava-1.5-7b-hf  \
--dataset_name data/llava_instruct_example.json

Before submitting

BUAADreamer avatar Apr 25 '24 13:04 BUAADreamer