OLMo
OLMo copied to clipboard
Look at data right where the spike happens
It is suspicious that we had two slightly different models (one with biases, one without), that both spiked at exactly the same moment. This suggests there might be a data issue.
In block 0, exp_avg_sq for attn_norm.weight.max seems to spike on step 1581, earlier than all the other spikes.
attn_out.weight.max is even more pronounced.
Marking the items prior to Feb 29th as "closed".