TinyLLaVA_Factory
TinyLLaVA_Factory copied to clipboard
Fine-tuning TinyLLaVA-Phi-2-SigLIP-3.1B
Thanks for your great work. I could not find the details to fine-tune the latest model TinyLLaVA-Phi-2-SigLIP-3.1B. Before I was following this to fine-tune the model using lora. But that script does not work with the latest branch. Could you please update the readme to include the details on how to fine-tune TinyLLaVA-Phi-2-SigLIP-3.1B on custom dataset?
Hi, thanks for your suggestion!
please see here https://github.com/TinyLLaVA/TinyLLaVA_Factory/blob/main/CUSTOM_FINETUNE.md
Thanks for updating. It would be great if you also add the evaluation script for custom trained lora model.
Hi. No matter it is trained through custom finetune or through normal training, all of the models trained by TinyLLaVA Factory have the same evaluation procedure. Please see here, https://tinyllava-factory.readthedocs.io/en/latest/Evaluation.html
Thanks for your prompt response. I will explain in more detail
-
I modified tinyllava/train/custom_finetune.py to include model.tokenizer = tokenizer at line 33. This change was necessary to avoid an error related to saving the model, as indicated in this line of lora_recipe.py
-
After training, my folder structure appears as shown below, with config.json containing "llm_model_name_or_path": "microsoft/phi-2"
-
For inference, I followed the inference instructions provided here.
When running the inference script, the language model is loaded using this line in load_model.py. However, it loads the "microsoft/phi-2" model instead of the fine-tuned TinyLLaVA model.
I tried modifying the code to: model.language_model = model.language_model.from_pretrained(os.path.join(model_name_or_path,"language_model"),torch_dtype=model_config.text_config.torch_dtype) However, this results in weird outputs.
There seem to be some issues in loading the language model. I would really appreciate it if you could have a look at this issue.
Since I am only tuning the LLM model with LoRA and keeping the connector and vision encoder frozen, replacing the model loading code with the following gave me expected results:
model = AutoModelForCausalLM.from_pretrained('tinyllava/TinyLLaVA-Phi-2-SigLIP-3.1B', trust_remote_code=True)
model_config = model.config
from peft import PeftModel
print("Loading LoRA weights...")
model = PeftModel.from_pretrained(model, model_name_or_path)
print("Merging LoRA weights...")
model = model.merge_and_unload()
print("Model is loaded...")
Hi, thanks for your sharing. We have fixed the bug. Please pull the latest version, hope that can fix your problem as well.
Everything seems to work well now, thanks! I have couple of other questions as well
- Is TinyLLaVA trained on a single-turn conversation dataset? Is it possible to train it on multi-turn conversations?
- Can TinyLLaVA accept multiple images instead of just one?
Everything seems to work well now, thanks! I have couple of other questions as well
- Is TinyLLaVA trained on a single-turn conversation dataset? Is it possible to train it on multi-turn conversations?
- Can TinyLLaVA accept multiple images instead of just one?
Hi. 1. TinyLLaVA's trainingset is the same with that of LLaVA1.5 or ShareGPT4v, so it is trained on multi-turn conversations during finetuning stage.
- Currently it does not accept multiple images, but in future we will make it so.