Hao Mi

Results 2 comments of Hao Mi

> Because we want to use the scale factor of the FINAL residual stream to scale COMPONENTS of the residual stream, and you can't infer the final norm from partial...