x-clip icon indicating copy to clipboard operation
x-clip copied to clipboard

Distributed training setup

Open MicPie opened this issue 3 years ago • 1 comments

PR for the distributed training setup.

MicPie avatar Jan 12 '22 20:01 MicPie

Packages I currently work on:

  • grad cache
  • pytorch AMP FP16 training
  • lr schedule

Other packages that will be needed:

  • review and check (web)dataset setup incl. text mask output and validation dataset
  • add accuracy logging
  • add ImageNet eval
  • 8bit adam/zero optimizer
  • test horovod training if needed
  • test deep speed training if needed
  • see small TO DOs in the code base

Other stuff:

  • add Hopfield network for CLOOB (InfoLOOB is there)

MicPie avatar Jan 12 '22 20:01 MicPie