Yue Zhao
Yue Zhao
Hi! Thank you for this great work! I think there is a small bug in data/mm_data/vqa_gen_dataset.py. When getitem in the dataset, it looks like you forget to set the max_tgt_length....
**Describe the bug** The behavior is same to what is reported in #4565 . When model.step() with zero3, Tensors are on different devices. I modified [stage3.py#L2117](https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/runtime/zero/stage3.py#L2117) to self.fp32_partitioned_groups_flat[sub_group_id].grad.mul_(1. / combined_scale.item()),...
Hi, thanks for this innovative work! Would you please make it clear to the community whether you will release your algorithms? Thanks for your precious time!