OLMo icon indicating copy to clipboard operation
OLMo copied to clipboard

How the 1B and 7B model are initialized?

Open sanyalsunny111 opened this issue 1 year ago • 2 comments

❓ The question

I am curious how OLMo 1B and 7B models are initialized during (actually before) pre-training? The paper doesn't have this info?

I found this but still unsure which one is finally used during pre-training.

https://github.com/allenai/OLMo/blob/d72a262645d831cc80d4a974718598998103075f/olmo/config.py#L195

sanyalsunny111 avatar Jun 24 '24 18:06 sanyalsunny111

The initialization method for the OLMo 1B and 7B models has evolved over time. You can find the configuration for the initialization function in the olmo/config.py file, specifically in the InitFnType class at line 437. As of now, it is set to normal.

The actual initialization logic for these models is defined in the reset_parameters function within the olmo/model.py file. This function checks the init_fn configuration to determine which initialization method to use for the model parameters.

aman-17 avatar Oct 23 '24 21:10 aman-17

I went through the configs files and found that for official-0724 release the weights were initialized using mitchell method, while for official-1124 release the weights were initialized with truncated normal. Is there a specific reason for this change? Do you find the latter init method to perform more stable?

nil0x9 avatar Jan 15 '25 09:01 nil0x9

Hi! Thanks for the question. We’re currently working on closing out old tickets, and we apologize that we didn’t get to you in a timely fashion. We’re closing this out for now, but if you’d still like an answer, please re-open and we will get back to you!

baileykuehl avatar Jul 01 '25 17:07 baileykuehl