DeepSpeedExamples
DeepSpeedExamples copied to clipboard
Dynamic batch support
Machine Translation usually takes dynamically sized batch composed of X tokens instead of X sentences as training input. I'm wondering why deepspeed requires specifying train_batch_size
and train_micro_batch_size_per_gpu
, both of which refer to the number of samples. Is this a concern for implementation details? Or is it possible to support dynamic size as in the case of machine translation without extra cost of efficiency and memory usage?