littttttlebird

Results 2 comments of littttttlebird

结果是空的,是因为词典里有空格,训练不充分时,会解码出来很多空格。 还有其他bug,比如attention计算的时候,没有对长短不齐的encoder_outputs做mask

embedding layer distillation use the same loss function MSE as hidden state layer. so, the embedding layer distillation loss compute is same with hidden state. the code is at 958~960...