DeepSpeedExamples Dynamic batch support

Dynamic batch support

Open pluiez opened this issue 2 years ago • 1 comments

Machine Translation usually takes dynamically sized batch composed of X tokens instead of X sentences as training input. I'm wondering why deepspeed requires specifying train_batch_size and train_micro_batch_size_per_gpu, both of which refer to the number of samples. Is this a concern for implementation details? Or is it possible to support dynamic size as in the case of machine translation without extra cost of efficiency and memory usage?

Sep 15 '21 18:09 pluiez

DeepSpeedExamples DeepSpeedExamples copied to clipboard

Dynamic batch support

DeepSpeedExamples
DeepSpeedExamples copied to clipboard