EasyLM
EasyLM copied to clipboard
Too small initializer variance
Thank you very much for the update to support llama 3 model!
I noticed that config.initializer_range
is default to 0.02, and jax.nn.initializers.normal(self.config.initializer_range / np.sqrt(config.hidden_size))
is used for initialization.
However, in the old version of EasyLM, config.initializer_range
is default to 0.02, but jax.nn.initializers.normal(self.config.initializer_range)
is used instead.
Will the new way of initialization have a much smaller variance, and is that by design?
Thank you very much for your time and help!