Pytorch-MiniLM
Pytorch-MiniLM copied to clipboard
Error in the DDP mode
when using the DDP mode to train the model, it would raise the error of "This error indicates that your module has parameters that were not used in producing loss". Since the minilm model only uses the attention parameters, so the parameters of student model like "bert.encoder.layer.-1.output.xx" and "cls.predictions.transform.xx" would have no gradient updates. So how to fix this problem? thanks.