qd-3dt
qd-3dt copied to clipboard
About DistributedDataParallel
Hi, I can see that the source code only use non_distributed training even with multiple GPUs training. Is there any special reason why you use non_distributed training?