Add Multimodal LLM Finetuning

Open BUAADreamer opened this issue 10 months ago • 0 comments

What does this PR do?

Add finetuning Multimodal-LLM especially for LLaVA by leveraging AutoModelForVision2Seq and AutoProcessortransformers

This PR is working in progress, needs improvement in the future, e.g. other MLLM.

Support Models

[x] LLaVA-1.5

Make your own Instruct Dataset

Just organize the content like the data/llava_instruct_example.json.

Train and Test

Training

# train
bash examples/mllm/sft_llava.sh

Test SFT model

python scripts/test_mllm.py \
--base_model_path llava-hf/llava-1.5-7b-hf \
--lora_model_path saves/llava-1.5-7b/lora/sft \
--model_path saves/llava-1.5-7b/lora/merged \
--dataset_name data/llava_instruct_example.json \
--do_merge

Test original model

python scripts/test_mllm.py \
--model_path llava-hf/llava-1.5-7b-hf  \
--dataset_name data/llava_instruct_example.json

Before submitting

[x] Did you read the contributor guideline?

Apr 25 '24 13:04 BUAADreamer

LLaMA-Factory LLaMA-Factory copied to clipboard

Add Multimodal LLM Finetuning

What does this PR do?

Support Models

Make your own Instruct Dataset

Train and Test

Training

Test SFT model

Test original model

Before submitting

LLaMA-Factory
LLaMA-Factory copied to clipboard