EasyLM icon indicating copy to clipboard operation
EasyLM copied to clipboard

Too small initializer variance

Open LeoXinhaoLee opened this issue 6 months ago • 0 comments

Thank you very much for the update to support llama 3 model!

I noticed that config.initializer_range is default to 0.02, and jax.nn.initializers.normal(self.config.initializer_range / np.sqrt(config.hidden_size)) is used for initialization.

However, in the old version of EasyLM, config.initializer_range is default to 0.02, but jax.nn.initializers.normal(self.config.initializer_range) is used instead.

Will the new way of initialization have a much smaller variance, and is that by design?

Thank you very much for your time and help!

LeoXinhaoLee avatar Aug 19 '24 22:08 LeoXinhaoLee