Wentao Jiang

Results 21 comments of Wentao Jiang

It's the zero-init problem. After using the correct zero-init, the loss begins at 0.03. The inference results are not noise. Many thanks to @LinB203 for the insightful suggestion.