YupengZheng
Results
2
comments of
YupengZheng
In the pretrain, using the Wikipedia dataset and using the learning rate of 1e-4 can help jump out of the local optimal solution, and the loss can be reduced to...
@toilaluan When I train ICAE with lm_ratio=0, the loss can reach under 0.1. However, when I set lm_ration=0.4, I face the same problem as you, so what's your lm_ratio? By...