Pretrained-Language-Model
                                
                                
                                
                                    Pretrained-Language-Model copied to clipboard
                            
                            
                            
                        Why only teacher_model is applied DistributedDataParallel in general_distill.py ?

I am not familiar with pytorch's DistributedDataParallel, and I am confused that why only teacher_model is applied DistributedDataParallel in general_distill.py ?