Wentao Jiang
Results
21
comments of
Wentao Jiang
It's the zero-init problem. After using the correct zero-init, the loss begins at 0.03. The inference results are not noise. Many thanks to @LinB203 for the insightful suggestion.