TinyLLaVA_Factory icon indicating copy to clipboard operation
TinyLLaVA_Factory copied to clipboard

Fine-tuning TinyLLaVA-Phi-2-SigLIP-3.1B

Open saadi297 opened this issue 1 year ago • 1 comments

Thanks for your great work. I could not find the details to fine-tune the latest model TinyLLaVA-Phi-2-SigLIP-3.1B. Before I was following this to fine-tune the model using lora. But that script does not work with the latest branch. Could you please update the readme to include the details on how to fine-tune TinyLLaVA-Phi-2-SigLIP-3.1B on custom dataset?

saadi297 avatar May 22 '24 16:05 saadi297

Hi, thanks for your suggestion!

please see here https://github.com/TinyLLaVA/TinyLLaVA_Factory/blob/main/CUSTOM_FINETUNE.md

TinyLLaVA avatar May 23 '24 14:05 TinyLLaVA

Thanks for updating. It would be great if you also add the evaluation script for custom trained lora model.

saadi297 avatar May 24 '24 17:05 saadi297

Hi. No matter it is trained through custom finetune or through normal training, all of the models trained by TinyLLaVA Factory have the same evaluation procedure. Please see here, https://tinyllava-factory.readthedocs.io/en/latest/Evaluation.html

YingHuTsing avatar May 25 '24 02:05 YingHuTsing

Thanks for your prompt response. I will explain in more detail

  1. I modified tinyllava/train/custom_finetune.py to include model.tokenizer = tokenizer at line 33. This change was necessary to avoid an error related to saving the model, as indicated in this line of lora_recipe.py

  2. After training, my folder structure appears as shown below, with config.json containing "llm_model_name_or_path": "microsoft/phi-2" image

  3. For inference, I followed the inference instructions provided here.

When running the inference script, the language model is loaded using this line in load_model.py. However, it loads the "microsoft/phi-2" model instead of the fine-tuned TinyLLaVA model.

I tried modifying the code to: model.language_model = model.language_model.from_pretrained(os.path.join(model_name_or_path,"language_model"),torch_dtype=model_config.text_config.torch_dtype) However, this results in weird outputs.

There seem to be some issues in loading the language model. I would really appreciate it if you could have a look at this issue.

saadi297 avatar May 25 '24 08:05 saadi297

Since I am only tuning the LLM model with LoRA and keeping the connector and vision encoder frozen, replacing the model loading code with the following gave me expected results:

model = AutoModelForCausalLM.from_pretrained('tinyllava/TinyLLaVA-Phi-2-SigLIP-3.1B', trust_remote_code=True)
model_config = model.config
from peft import PeftModel
print("Loading LoRA weights...")
model = PeftModel.from_pretrained(model, model_name_or_path)
print("Merging LoRA weights...")
model = model.merge_and_unload()
print("Model is loaded...")

saadi297 avatar May 25 '24 09:05 saadi297

Hi, thanks for your sharing. We have fixed the bug. Please pull the latest version, hope that can fix your problem as well.

YingHuTsing avatar May 25 '24 14:05 YingHuTsing

Everything seems to work well now, thanks! I have couple of other questions as well

  1. Is TinyLLaVA trained on a single-turn conversation dataset? Is it possible to train it on multi-turn conversations?
  2. Can TinyLLaVA accept multiple images instead of just one?

saadi297 avatar May 25 '24 14:05 saadi297

Everything seems to work well now, thanks! I have couple of other questions as well

  1. Is TinyLLaVA trained on a single-turn conversation dataset? Is it possible to train it on multi-turn conversations?
  2. Can TinyLLaVA accept multiple images instead of just one?

Hi. 1. TinyLLaVA's trainingset is the same with that of LLaVA1.5 or ShareGPT4v, so it is trained on multi-turn conversations during finetuning stage.

  1. Currently it does not accept multiple images, but in future we will make it so.

YingHuTsing avatar May 26 '24 02:05 YingHuTsing