DeepSpeedExamples icon indicating copy to clipboard operation
DeepSpeedExamples copied to clipboard

Dynamic batch support

Open pluiez opened this issue 4 years ago • 1 comments

Machine Translation usually takes dynamically sized batch composed of X tokens instead of X sentences as training input. I'm wondering why deepspeed requires specifying train_batch_size and train_micro_batch_size_per_gpu, both of which refer to the number of samples. Is this a concern for implementation details? Or is it possible to support dynamic size as in the case of machine translation without extra cost of efficiency and memory usage?

pluiez avatar Sep 15 '21 18:09 pluiez

The primary reason is to figure out the number of required gradient accumulation steps.

tjruwase avatar Sep 15 '21 18:09 tjruwase