Mashrur Morshed

Results 3 comments of Mashrur Morshed

@p4perf4ce I was just about to say the same thing. If einops had concat my code would finally be framework independent and super neat.

Hello. Yeah, Knowledge Distillation is indeed mentioned in the paper. They used Multi-Headed-Attention RNN as the "teacher" model and the Keyword Transformers as the student models. In the [author's official...

> Thank you, everybody! So, why does the first part of the KD loss function in distill_mnist.py multiply 2? As per [distiller](https://intellabs.github.io/distiller/knowledge_distillation.html) KD_Loss is effectively the following equation: ```python α...