AYolov2 Does knowledge distillation support multi-gpu?

Hi, thanks for your sharing. When I tried to use multi-gpu to train Knowledge Distillation: python3 -m torch.distributed.run --nproc_per_node $N_GPU distillation.py ... I got the error: torch.distributed.elastic.multiprocessing.errors.ChildFailedError: distillation.py FAILED Failures: <NO_OTHER_FAILURES>

Feb 08 '22 07:02 wingvortex

KD currently does not support multi-gpu. We adopted KD method from "End-to-end semi-supervised object dection with soft teacher" and generating teacher's feature(I would say it's a guide feature for student) was heavy operation.

The bottom line is KD is already using two GPUs. One for student and one for teacher. You can check out in https://github.com/j-marple-dev/AYolov2/blob/main/distillation.py#L66

FYI, we have not managed to see a beneficial point using KD yet.

Feb 08 '22 23:02 JeiKeiLim

Already noticed that both the student and the teacher takes a GPU, and the teacher uses quite a lot of GPU memory. Thanks for your extra information, do you mean the performance gain is limited when applying the semi-supervised object detection? In my case, the labeled data to unlabeled data ratio is 1:2.

Feb 09 '22 00:02 wingvortex