LLaMA-Factory
LLaMA-Factory copied to clipboard
Add Multimodal LLM Finetuning
What does this PR do?
Add finetuning Multimodal-LLM by leveraging AutoModelForVision2Seq and AutoProcessortransformers
This PR is working in progress, need improvement in the future
TODO
- [ ] LLaVA
- [ ] Instruct-BLIP
Before submitting
- [x] Did you read the contributor guideline?