ImageBind
ImageBind copied to clipboard
Training resources
Thanks for your wonderful work. I am very excited about your idea. May I ask the computation budget used to train the largest Imagebind model? How many GPU hour do you use?
I created a simple ImageBind finetuning example using LoRA: https://github.com/fabawi/ImageBind-LoRA
Make sure you clone it recursively to include the example dataset: git clone --recurse-submodules -j8 [email protected]:fabawi/ImageBind-LoRA.git
Install the requirements following the instructions provided in this repo, and run train.py
This should log your checkpoints, as well as separate LoRA if you'd like to update the original model without saving all the model params. More examples and finer control to be added soon
I will post some benchmarks in the README soon
I created a simple ImageBind finetuning example using LoRA: https://github.com/fabawi/ImageBind-LoRA
Make sure you clone it recursively to include the example dataset: git clone --recurse-submodules -j8 [email protected]:fabawi/ImageBind-LoRA.git
Install the requirements following the instructions provided in this repo, and run train.py
This should log your checkpoints, as well as separate LoRA if you'd like to update the original model without saving all the model params. More examples and finer control to be added soon
I will post some benchmarks in the README soon
Thanks. But I actually would like to know how much budget used to train ImageBind from the scratch.
I created a simple ImageBind finetuning example using LoRA: https://github.com/fabawi/ImageBind-LoRA Make sure you clone it recursively to include the example dataset: git clone --recurse-submodules -j8 [email protected]:fabawi/ImageBind-LoRA.git Install the requirements following the instructions provided in this repo, and run train.py This should log your checkpoints, as well as separate LoRA if you'd like to update the original model without saving all the model params. More examples and finer control to be added soon I will post some benchmarks in the README soon
Thanks. But I actually would like to know how much budget used to train ImageBind from the scratch.
Hi, here to recommend our work, which is LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment. We open source all training and validation code. For video just only 16 V100s are needed, if you turn on gradient accumulation then 8 V100s are fine. For depth maps and infrared maps, only 8 V100s are needed.