open_clip
open_clip copied to clipboard
Add initial FLAVA model and training
python -m training.main \
--save-frequency 1 \
--zeroshot-frequency 1 \
--report-to tensorboard \
--hf-tokenizer-name gpt2 \
--train-data=~/flava_cc/Train_GCC-training_output.csv \
--val-data=~/flava_cc/Validation_GCC-1.1.0-Validation_output.csv \
--csv-img-key filepath \
--csv-caption-key title \
--imagenet-val=~/imagenet_validation \
--warmup 10000 \
--batch-size=64 \
--lr=1e-3 \
--wd=0.1 \
--epochs=30 \
--workers=8 \
--model flava-debug
Left a few comments
Overall looks a bit big but interesting
Are you interested in running some experiments at scale ( for example 32 epochs of laion400m) to make sure it works ? If yes we could provide compute
Could you please rebase on main ?
if we decide to go the 'merge FLAVA methods' route instead of the 'add FLAVA separately' route, we could do something where you pass parameters for what pretraining objectives you want to include and then add some ObjectiveSampler class that uses the loss functions according to some schedule. In FLAVA they use a round-robin schedule but I'd be interested to see how different ones effect training
Things are going to start getting out of hand w/ nature of the current train loop and adding support for different losses, needing to handle different masks, pull different items out of dataset, etc...
When I was experimenting with gradient caching I decided it was best to add another layer of abstraction to the train loop. I wanted to avoid going 'full lightning' on it, but made a minimal 'TrainJig' ... could also call it a 'TrainTask' to pull optimizer setup, loss handling, task specific step logic out of the main loop.
I stopped working on grad caching because the complexity was not proving worthwhile, but might be good to bring back and refine that abstraction (https://github.com/rwightman/open_clip/blob/grad_cache/src/training/train_jig.py)
We could have a ClipTrainJig/TrainTask, FlavaTrainTask, etc... to get rid of if is_flava, etc in main loop, and allow other multi-modal train tasks to be added without further mayhem...