PKD-for-BERT-Model-Compression
PKD-for-BERT-Model-Compression copied to clipboard
pytorch implementation for Patient Knowledge Distillation for BERT Model Compression
请问一个问题
代码中有一个--teacher_prediction,这个哪来的?是在训练teacher模型中保存下来的?为什么没看到?
Hi, Thank you for your interesting work! I just wondering why don`t you used the pooler for only KD.Full and if you use the pooler, did you initialize the pooler...
Hi, Thank you for your interesting work! I have just started to learn BERT and distillation recently. I have some general questions regarding this topic. 1. I want to compare...
First, thank you for releasing your code. I am trying to reproduce results of your paper. I am running `NLI_KD_training.py` for MRPC with DEBUG=True. The setting I am running is...