ll comments

Repositories
Issues
Comments

Results 3 comments of

ll

.csv文件导入neo4j

同问，没能找到解决方法。

想问下tinybert在做task specific distilaltion时为什么要分为两步？

做了个实验（分类任务），合并计算的效果明显差于分成两个阶段。原因猜想： `total_loss = intermediate_loss+prediction_loss` 模型中绝大多数参数集中在intermediate layer中，因此total_loss中intermediate_loss占绝大部分，优化重点会偏向intermediate layer。但对于下游任务而言，比如分类任务，prediction layer的参数可能更加重要，但却没有被很好的优化到，不能够很好的学习到teacher model中prediction layer的参数分布。分阶段之后，第二阶段只对prediction layer蒸馏，可以保证student model能够较好学习到teacher model中prediction layer的参数分布。

ViterbiDecoder pads <start> tag instead of <end> tag in end of sequence

yes, i agree with you.