TinyLLaVA_Factory Fine-tuning TinyLLaVA-Phi-2-SigLIP-3.1B

Thanks for your great work. I could not find the details to fine-tune the latest model TinyLLaVA-Phi-2-SigLIP-3.1B. Before I was following this to fine-tune the model using lora. But that script does not work with the latest branch. Could you please update the readme to include the details on how to fine-tune TinyLLaVA-Phi-2-SigLIP-3.1B on custom dataset?

May 22 '24 16:05 saadi297

Hi, thanks for your suggestion!

please see here https://github.com/TinyLLaVA/TinyLLaVA_Factory/blob/main/CUSTOM_FINETUNE.md

May 23 '24 14:05 TinyLLaVA

Thanks for updating. It would be great if you also add the evaluation script for custom trained lora model.

May 24 '24 17:05 saadi297

Hi. No matter it is trained through custom finetune or through normal training, all of the models trained by TinyLLaVA Factory have the same evaluation procedure. Please see here, https://tinyllava-factory.readthedocs.io/en/latest/Evaluation.html

May 25 '24 02:05 YingHuTsing

Thanks for your prompt response. I will explain in more detail

I modified tinyllava/train/custom_finetune.py to include model.tokenizer = tokenizer at line 33. This change was necessary to avoid an error related to saving the model, as indicated in this line of lora_recipe.py
After training, my folder structure appears as shown below, with config.json containing "llm_model_name_or_path": "microsoft/phi-2"
For inference, I followed the inference instructions provided here.

When running the inference script, the language model is loaded using this line in load_model.py. However, it loads the "microsoft/phi-2" model instead of the fine-tuned TinyLLaVA model.

I tried modifying the code to: model.language_model = model.language_model.from_pretrained(os.path.join(model_name_or_path,"language_model"),torch_dtype=model_config.text_config.torch_dtype) However, this results in weird outputs.

There seem to be some issues in loading the language model. I would really appreciate it if you could have a look at this issue.

May 25 '24 08:05 saadi297

Since I am only tuning the LLM model with LoRA and keeping the connector and vision encoder frozen, replacing the model loading code with the following gave me expected results:

model = AutoModelForCausalLM.from_pretrained('tinyllava/TinyLLaVA-Phi-2-SigLIP-3.1B', trust_remote_code=True)
model_config = model.config
from peft import PeftModel
print("Loading LoRA weights...")
model = PeftModel.from_pretrained(model, model_name_or_path)
print("Merging LoRA weights...")
model = model.merge_and_unload()
print("Model is loaded...")

May 25 '24 09:05 saadi297

Hi, thanks for your sharing. We have fixed the bug. Please pull the latest version, hope that can fix your problem as well.

May 25 '24 14:05 YingHuTsing

Everything seems to work well now, thanks! I have couple of other questions as well

Is TinyLLaVA trained on a single-turn conversation dataset? Is it possible to train it on multi-turn conversations?
Can TinyLLaVA accept multiple images instead of just one?

May 25 '24 14:05 saadi297

Everything seems to work well now, thanks! I have couple of other questions as well

Is TinyLLaVA trained on a single-turn conversation dataset? Is it possible to train it on multi-turn conversations?

Can TinyLLaVA accept multiple images instead of just one?

Hi. 1. TinyLLaVA's trainingset is the same with that of LLaVA1.5 or ShareGPT4v, so it is trained on multi-turn conversations during finetuning stage.

Currently it does not accept multiple images, but in future we will make it so.

May 26 '24 02:05 YingHuTsing

TinyLLaVA_Factory TinyLLaVA_Factory copied to clipboard

Fine-tuning TinyLLaVA-Phi-2-SigLIP-3.1B

TinyLLaVA_Factory
TinyLLaVA_Factory copied to clipboard