x-clip
x-clip copied to clipboard
Distributed training setup
PR for the distributed training setup.
Packages I currently work on:
- grad cache
- pytorch AMP FP16 training
- lr schedule
Other packages that will be needed:
- review and check (web)dataset setup incl. text mask output and validation dataset
- add accuracy logging
- add ImageNet eval
- 8bit adam/zero optimizer
- test horovod training if needed
- test deep speed training if needed
- see small TO DOs in the code base
Other stuff:
- add Hopfield network for CLOOB (InfoLOOB is there)