dinglei8908

Results 1 comments of dinglei8908

> Try gradient accumulation with `--update-freq` thanks for suggestion. max mini batch size is 2/4 on V100/A100 for caption task, does this make sense?