OLMo icon indicating copy to clipboard operation
OLMo copied to clipboard

See what we can learn from https://arxiv.org/abs/2309.14322

Open epwalsh opened this issue 2 years ago • 2 comments

Small-scale proxies for large-scale Transformer training instabilities

epwalsh avatar Sep 26 '23 21:09 epwalsh

twitter summary https://x.com/mitchnw/status/1707415874456735905?s=46

soldni avatar Sep 28 '23 19:09 soldni

Wow, so Z-loss is potentially very useful.

epwalsh avatar Sep 28 '23 21:09 epwalsh

Marking the items prior to Feb 29th as "closed".

dumitrac avatar Apr 30 '24 21:04 dumitrac