Pretrained-Language-Model Why only teacher_model is applied DistributedDataParallel in general

Why only teacher_model is applied DistributedDataParallel in general_distill.py ?

Open 1024er opened this issue 5 years ago • 0 comments

I am not familiar with pytorch's DistributedDataParallel, and I am confused that why only teacher_model is applied DistributedDataParallel in general_distill.py ?

Mar 29 '20 13:03 1024er

Pretrained-Language-Model Pretrained-Language-Model copied to clipboard

Why only teacher_model is applied DistributedDataParallel in general_distill.py ?

Pretrained-Language-Model
Pretrained-Language-Model copied to clipboard