selective modality finetune
Thanks for the awesome work! I wonder if I have my own audio-text dataset available for example, and want to just finetune the audio-text modality, how can I achieve it?
I created a simple ImageBind finetuning example using LoRA: https://github.com/fabawi/ImageBind-LoRA
Make sure you clone it recursively to include the example dataset: git clone --recurse-submodules -j8 [email protected]:fabawi/ImageBind-LoRA.git
Install the requirements following the instructions provided in this repo, and run train.py
This should log your checkpoints, as well as separate LoRA if you'd like to update the original model without saving all the model params. More examples and finer control to be added soon
Selective fine-tuning is also possible. Checkout https://github.com/fabawi/ImageBind-LoRA/blob/09427ff4bcff2ef20a350cfea5aec3ca11a09af7/train.py#L220 . For now you can manually modify lora_modality_names and lora_layer_idxs to specify which lora layers get finetuned