Zhong-Yi Li comments

Repositories
Issues
Comments

Results 2 comments of


                                            Zhong-Yi Li

Ocd

Yes, as the paper indicated, the loss they used is KL divergence; however, when performing backprop in this scenario, the two losses are actually equivalent in terms of gradient calculation....

I think PyTorch does [automatic differentiation](https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html) for you. Baidu realized their own backward function because they want their own optimized version. ([DeepSpeech2, Page 27](https://arxiv.org/pdf/1512.02595.pdf))