blocks icon indicating copy to clipboard operation
blocks copied to clipboard

Better configuration of initialization

Open janchorowski opened this issue 9 years ago • 2 comments

Hi,

currently the Initializable base class supports initializing:

  1. weights
  2. biases, with a quirky has_bias property to know when not to set drop bias_init
  3. rngs, this time without a need_rng property or similar

For recurrent bricks, we really need four initializers: for state-state weights one may wand the othogonal initialization, for state-gate one may need something classical (there are not recurrent), then one may want to have a bias (though biases may be passsed in the "vertical" connections), and finally initiali states may also be learned and initialized in their own, peculiar way.

Similarly, one may want to learn the initial symbol in a generator (#384).

Thus I feel a need for a more generic initialization specification.

One approach is to extend Initializable._push_initialization_config to push all sorts of initialization methods (e.g. by pushing every attribute whose name ends in '_init') to its child bricks. then the child bricks will be able to puch it their children. As a downside this "pollutes" all bricks with initializations they may not need. I don't know how bad this polution is, since right now the Bias brick also gets an unneeded weights_init attribute.

Another approach is to have a generic push_initialization_config method which handles the recursion itself:

global_push_initialization_config(top_brick, initializations):
    if hasattr(brick,"_push_initialization_config"):
        brick._push_initialization_config(initializations)
    for k,v in initializations:
        if hasattr(brick, k):
            setattr(brick,k,v)
    for c in brick.children:
            global_push_initializaiton_config(c, initializations)

this has the benefit of not polluting the bricks that do not initialize anything with an initialization config.

Then any brick that wants to be initalized needs to have (possibly set to None) attributes xxx_init. This can be done in Initializable's init class (it can go thourgh kwargs looking for stuff that ends in '_init').

janchorowski avatar Jun 30 '15 15:06 janchorowski

In a discussion with @janchorowski the following design was born:

  • Initializable mixin adds a dictionary which maps parameter roles (WEIGHTS, BIASES, INITIAL_STATES, etc.) to initialization schemes. This will be later referred to as 'initialization dictionary'.
  • _initialize of Initializable method uses this dictionary to automatically initialize parameters. No need to implement _initialize in custom bricks at all! If a scheme for a particular role is not given, a search in the role hierarchy is made to find the most specific role for which a scheme is set. E.g. if there is a role RECURRENT_WEIGHTS, the fallback role would be WEIGHTS
  • its _push_initialization_config methods simply propagates the initialization dictionary down the hierarchy
  • every brick that has a parameter must have a parameter_roles attribute, which says what will be the roles of the parameters of this brick. For higher levels brick such roles are collected recursively. This way every brick can have only relevant entries in its initialization dictionary. The parameter_roles attribute is a natural generalization of has_bias
  • for entries of the initialization dictionary attribute based access should be also provided. Put it differently, we should keep brick.weights_init and brick.biases_init, not only for the sake of compatibility but also because it is convenient and also for another reason which I now feel lazy to explain.

@memimo , you might be interested.

rizar avatar Jul 02 '15 14:07 rizar

A few comments:

  • I kind of hate the has_bias/use_bias duplication and it would be great if we could find a way to reconcile the two into one thing. It sounds like the parameter roles thing is the right way.
  • The parameter roles thing seems like it could be (and should be) computed using the variable filter. I think we want as few distinct attributes as possible to both avoid cognitive overhead and also avoid the possibility of them becoming inconsistent with each other.
  • This business of searching the role hierarchy is fine along as we don't do anything with multiple inheritance... Then I think it could get messy.

dwf avatar Jul 02 '15 21:07 dwf