open_clip icon indicating copy to clipboard operation
open_clip copied to clipboard

Add initial FLAVA model and training

Open gmittal opened this issue 2 years ago • 4 comments

python -m training.main \
    --save-frequency 1 \
    --zeroshot-frequency 1 \
    --report-to tensorboard \
    --hf-tokenizer-name gpt2 \
    --train-data=~/flava_cc/Train_GCC-training_output.csv \
    --val-data=~/flava_cc/Validation_GCC-1.1.0-Validation_output.csv \
    --csv-img-key filepath \
    --csv-caption-key title \
    --imagenet-val=~/imagenet_validation \
    --warmup 10000 \
    --batch-size=64 \
    --lr=1e-3 \
    --wd=0.1 \
    --epochs=30 \
    --workers=8 \
    --model flava-debug

gmittal avatar Nov 10 '22 22:11 gmittal

Left a few comments

Overall looks a bit big but interesting

Are you interested in running some experiments at scale ( for example 32 epochs of laion400m) to make sure it works ? If yes we could provide compute

rom1504 avatar Nov 11 '22 21:11 rom1504

Could you please rebase on main ?

rom1504 avatar Nov 12 '22 17:11 rom1504

if we decide to go the 'merge FLAVA methods' route instead of the 'add FLAVA separately' route, we could do something where you pass parameters for what pretraining objectives you want to include and then add some ObjectiveSampler class that uses the loss functions according to some schedule. In FLAVA they use a round-robin schedule but I'd be interested to see how different ones effect training

iejMac avatar Nov 12 '22 19:11 iejMac

Things are going to start getting out of hand w/ nature of the current train loop and adding support for different losses, needing to handle different masks, pull different items out of dataset, etc...

When I was experimenting with gradient caching I decided it was best to add another layer of abstraction to the train loop. I wanted to avoid going 'full lightning' on it, but made a minimal 'TrainJig' ... could also call it a 'TrainTask' to pull optimizer setup, loss handling, task specific step logic out of the main loop.

I stopped working on grad caching because the complexity was not proving worthwhile, but might be good to bring back and refine that abstraction (https://github.com/rwightman/open_clip/blob/grad_cache/src/training/train_jig.py)

We could have a ClipTrainJig/TrainTask, FlavaTrainTask, etc... to get rid of if is_flava, etc in main loop, and allow other multi-modal train tasks to be added without further mayhem...

rwightman avatar Nov 14 '22 21:11 rwightman