hongxu.yin

Results 4 comments of hongxu.yin

Hi Mandy, thanks for letting us know. This set yields the accuracy of the provide checkpoint. Can you share your training environment and the exact code you run? Also hi...

Hi mehtadushy, thanks for letting us know. This set yields the accuracy of the provide checkpoint. Can you share your training environment and the exact code you run?

Using KL divergence instead of CE, and rescaling KL divergence into normal loss ranges - distillation setup details in Sec 4.4.

Hi @dohe0342 one way to try is to reduce batch size to alleviate the GPU burden. Also try using setting 2k iteration one to save on GPU burdern. Additionally you...