Weight initialization

Open OneAdder opened this issue 9 months ago • 3 comments

Weights Initialization

Added functions for Xavier and Kaiming. The rule of thumb here:

S-shaped activation (tanh, sigmoid, etc.) => Xavier
ReLU-shaped activation (relu, gelu, silu, etc.) => Kaiming

For networks without Layer or Batch Normalization, that simple tweak will significantly increase convergance

Feb 17 '25 19:02 OneAdder

Thanks, Michael, this is definitely needed.

About 1.5 years ago I started an Initializers PR (https://github.com/modern-fortran/neural-fortran/pull/151) but forgot about it. Basically it follows a similar pattern to how activations and optimizers are done in NF, which allows complete customization if specified, and sane defaults (like the ones you have here) if unspecified.

Do you think it would work well?

Feb 17 '25 19:02 milancurcic

Added it while doing this: https://github.com/OneAdder/neural-fortran/blob/text_classification_example/example/text_classification.f90

Feb 17 '25 19:02 OneAdder

@milancurcic Yes, I think #151 will work!

Feb 17 '25 20:02 OneAdder

neural-fortran neural-fortran copied to clipboard

Weight initialization

Weights Initialization

neural-fortran
neural-fortran copied to clipboard