How the 1B and 7B model are initialized?
❓ The question
I am curious how OLMo 1B and 7B models are initialized during (actually before) pre-training? The paper doesn't have this info?
I found this but still unsure which one is finally used during pre-training.
https://github.com/allenai/OLMo/blob/d72a262645d831cc80d4a974718598998103075f/olmo/config.py#L195
The initialization method for the OLMo 1B and 7B models has evolved over time. You can find the configuration for the initialization function in the olmo/config.py file, specifically in the InitFnType class at line 437. As of now, it is set to normal.
The actual initialization logic for these models is defined in the reset_parameters function within the olmo/model.py file. This function checks the init_fn configuration to determine which initialization method to use for the model parameters.
I went through the configs files and found that for official-0724 release the weights were initialized using mitchell method, while for official-1124 release the weights were initialized with truncated normal. Is there a specific reason for this change? Do you find the latter init method to perform more stable?
Hi! Thanks for the question. We’re currently working on closing out old tickets, and we apologize that we didn’t get to you in a timely fashion. We’re closing this out for now, but if you’d still like an answer, please re-open and we will get back to you!