Llava_Qwen2
Llava_Qwen2 copied to clipboard
Visual Instruction Tuning for Qwen2 Base Model
LLaVA-Qwen2: Enhanced with Qwen2 Base Model
Visual instruction tuning towards large language and vision models with GPT-4 level capabilities, enhanced with the Qwen2 base model.
For more details on usage, refer to the original LLaVA repository. This custom repository specifically integrates the Qwen2 base model to leverage its advanced capabilities.
Dataset for Pretraining and Finetuning
LLaVA Dataset + FinVis Dataset
Download
git lfs install
git clone https://www.modelscope.cn/TobyYang7/llava-qwen2-1.5b-instruct-finvis.git
MMMU Eval
Download the MMMU dataset first and rename it as MMMU_eval\data
. For more details, you need to follow the official instructions here.
bash eval.sh
LLaVA-Qwen2-1.5B Result
Subject | Data Num | Acc |
---|---|---|
Overall-Art and Design | 120 | 0.35 |
Art | 30 | 0.3 |
Art_Theory | 30 | 0.467 |
Design | 30 | 0.467 |
Music | 30 | 0.167 |
Overall-Business | 150 | 0.22 |
Accounting | 30 | 0.267 |
Economics | 30 | 0.133 |
Finance | 30 | 0.2 |
Manage | 30 | 0.3 |
Marketing | 30 | 0.2 |
Overall-Science | 150 | 0.267 |
Biology | 30 | 0.167 |
Chemistry | 30 | 0.267 |
Geography | 30 | 0.233 |
Math | 30 | 0.333 |
Physics | 30 | 0.333 |
Overall-Health and Medicine | 150 | 0.267 |
Basic_Medical_Science | 30 | 0.233 |
Clinical_Medicine | 30 | 0.333 |
Diagnostics_and_Laboratory_Medicine | 30 | 0.167 |
Pharmacy | 30 | 0.267 |
Public_Health | 30 | 0.333 |
Overall-Humanities and Social Science | 120 | 0.458 |
History | 30 | 0.467 |
Literature | 30 | 0.7 |
Sociology | 30 | 0.4 |
Psychology | 30 | 0.267 |
Overall-Tech and Engineering | 210 | 0.3 |
Agriculture | 30 | 0.367 |
Architecture_and_Engineering | 30 | 0.3 |
Computer_Science | 30 | 0.1 |
Electronics | 30 | 0.2 |
Energy_and_Power | 30 | 0.4 |
Materials | 30 | 0.333 |
Mechanical_Engineering | 30 | 0.4 |
Overall | 900 | 0.303 |
Pretrain Qwen2
bash pretrain_qwen2.sh
The checkpoint for the pretrain projector is located at checkpoints/Qwen2-1.5B-pretrain-FinVis/mm_projector.bin
Finetune Qwen2
bash ft_qwen2.sh
Interface
bash run_cli.sh
Installation
This repository builds upon the original LLaVA project, integrating the Qwen2 base model for improved performance.
If you are not using Linux, do NOT proceed, see instructions for macOS and Windows.
-
Clone this repository and navigate to the custom LLaVA folder
git clone https://github.com/TobyYang7/Llava_Qwen2.git cd Llava_Qwen2
-
Install Package
conda create -n llava python=3.10 -y conda activate llava pip install --upgrade pip # enable PEP 660 support pip install -e .
-
Install additional packages for training cases
pip install -e ".[train]" pip install flash-attn --no-build-isolation