Pretrained-Language-Model icon indicating copy to clipboard operation
Pretrained-Language-Model copied to clipboard

Embedding layer distillation not implemented?

Open aarmstrong78 opened this issue 5 years ago • 3 comments

Discussed in the paper and included in results, but I can't see this referenced in the Readme or anywhere in the code. Was it implemented in a later (unreleased) version?

aarmstrong78 avatar Feb 25 '20 09:02 aarmstrong78

embedding layer distillation use the same loss function MSE as hidden state layer. so, the embedding layer distillation loss compute is same with hidden state. the code is at 958~960 line of task_distill.py(new_student_reps[0], new_teacher_reps[0] are the embedding layer output)。

littttttlebird avatar Mar 06 '20 07:03 littttttlebird

@chuanhuayang I think he's talking about "general distillation". general distillation code dose not include embedding-layer distillation. but paper include it.

silencio94 avatar Apr 25 '20 16:04 silencio94

sequence_output of the TinyBertForSequenceClassification includes the Embedding layer's weights

ailinbest avatar Aug 31 '21 09:08 ailinbest