Hao Mi
Results
2
comments of
Hao Mi
Got it, thank you.
> Because we want to use the scale factor of the FINAL residual stream to scale COMPONENTS of the residual stream, and you can't infer the final norm from partial...