ImageBind icon indicating copy to clipboard operation
ImageBind copied to clipboard

Training resources

Open ustcwhy opened this issue 1 year ago • 3 comments

Thanks for your wonderful work. I am very excited about your idea. May I ask the computation budget used to train the largest Imagebind model? How many GPU hour do you use?

ustcwhy avatar May 10 '23 17:05 ustcwhy

I created a simple ImageBind finetuning example using LoRA: https://github.com/fabawi/ImageBind-LoRA

Make sure you clone it recursively to include the example dataset: git clone --recurse-submodules -j8 [email protected]:fabawi/ImageBind-LoRA.git

Install the requirements following the instructions provided in this repo, and run train.py

This should log your checkpoints, as well as separate LoRA if you'd like to update the original model without saving all the model params. More examples and finer control to be added soon

I will post some benchmarks in the README soon

fabawi avatar May 13 '23 12:05 fabawi

I created a simple ImageBind finetuning example using LoRA: https://github.com/fabawi/ImageBind-LoRA

Make sure you clone it recursively to include the example dataset: git clone --recurse-submodules -j8 [email protected]:fabawi/ImageBind-LoRA.git

Install the requirements following the instructions provided in this repo, and run train.py

This should log your checkpoints, as well as separate LoRA if you'd like to update the original model without saving all the model params. More examples and finer control to be added soon

I will post some benchmarks in the README soon

Thanks. But I actually would like to know how much budget used to train ImageBind from the scratch.

ustcwhy avatar May 13 '23 13:05 ustcwhy

I created a simple ImageBind finetuning example using LoRA: https://github.com/fabawi/ImageBind-LoRA Make sure you clone it recursively to include the example dataset: git clone --recurse-submodules -j8 [email protected]:fabawi/ImageBind-LoRA.git Install the requirements following the instructions provided in this repo, and run train.py This should log your checkpoints, as well as separate LoRA if you'd like to update the original model without saving all the model params. More examples and finer control to be added soon I will post some benchmarks in the README soon

Thanks. But I actually would like to know how much budget used to train ImageBind from the scratch.

Hi, here to recommend our work, which is LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment. We open source all training and validation code. For video just only 16 V100s are needed, if you turn on gradient accumulation then 8 V100s are fine. For depth maps and infrared maps, only 8 V100s are needed.

LinB203 avatar Oct 16 '23 02:10 LinB203