Pretrained-Language-Model
Pretrained-Language-Model copied to clipboard
Mapping of attention maps and hidden states in TinyBERT
Why do mappings of attention maps and hidden states are different, can you explain a little bit? Also when I want to plug-in my own model second mapping return "index out of range", when I use student with 6 blocks and teacher with 12 blocks, because 6 * 2 > 11 (max index in the hidden states list)
Attention maps mapping: https://github.com/huawei-noah/Pretrained-Language-Model/blob/master/TinyBERT/general_distill.py#L421-L423 Hidden states mapping: https://github.com/huawei-noah/Pretrained-Language-Model/blob/master/TinyBERT/general_distill.py#L432-L433