open_clip Add initial FLAVA model and training

python -m training.main \
    --save-frequency 1 \
    --zeroshot-frequency 1 \
    --report-to tensorboard \
    --hf-tokenizer-name gpt2 \
    --train-data=~/flava_cc/Train_GCC-training_output.csv \
    --val-data=~/flava_cc/Validation_GCC-1.1.0-Validation_output.csv \
    --csv-img-key filepath \
    --csv-caption-key title \
    --imagenet-val=~/imagenet_validation \
    --warmup 10000 \
    --batch-size=64 \
    --lr=1e-3 \
    --wd=0.1 \
    --epochs=30 \
    --workers=8 \
    --model flava-debug

Nov 10 '22 22:11 gmittal

Left a few comments

Overall looks a bit big but interesting

Are you interested in running some experiments at scale ( for example 32 epochs of laion400m) to make sure it works ? If yes we could provide compute

Nov 11 '22 21:11 rom1504

Could you please rebase on main ?

Nov 12 '22 17:11 rom1504

if we decide to go the 'merge FLAVA methods' route instead of the 'add FLAVA separately' route, we could do something where you pass parameters for what pretraining objectives you want to include and then add some ObjectiveSampler class that uses the loss functions according to some schedule. In FLAVA they use a round-robin schedule but I'd be interested to see how different ones effect training

Nov 12 '22 19:11 iejMac

Things are going to start getting out of hand w/ nature of the current train loop and adding support for different losses, needing to handle different masks, pull different items out of dataset, etc...

When I was experimenting with gradient caching I decided it was best to add another layer of abstraction to the train loop. I wanted to avoid going 'full lightning' on it, but made a minimal 'TrainJig' ... could also call it a 'TrainTask' to pull optimizer setup, loss handling, task specific step logic out of the main loop.

I stopped working on grad caching because the complexity was not proving worthwhile, but might be good to bring back and refine that abstraction (https://github.com/rwightman/open_clip/blob/grad_cache/src/training/train_jig.py)

We could have a ClipTrainJig/TrainTask, FlavaTrainTask, etc... to get rid of if is_flava, etc in main loop, and allow other multi-modal train tasks to be added without further mayhem...

Nov 14 '22 21:11 rwightman

open_clip open_clip copied to clipboard

Add initial FLAVA model and training

open_clip
open_clip copied to clipboard