LYF
LYF
May I ask if it is convenient to provide the TEACHER model used for the distillation of specific tasks, that is, the TEACHER model after fine-tuning of each task?
How is the softmax classifier initialized in the Bert-Base model? Is zero initialized?
Do you have time to fill in this blanks?
Hello, I wonder if I can get your pre-processed brain data?
when test quantization, it raises errors. May I ask if anyone has encountered this problem? pytorch==3.8.1 transformers==4.7.0
Hi, would you like to publish the code of your pretrain stage? We are very much looking forward to further research based on this. Thanks!