blocks
blocks copied to clipboard
Better configuration of initialization
Hi,
currently the Initializable
base class supports initializing:
- weights
- biases, with a quirky has_bias property to know when not to set drop bias_init
- rngs, this time without a need_rng property or similar
For recurrent bricks, we really need four initializers: for state-state weights one may wand the othogonal initialization, for state-gate one may need something classical (there are not recurrent), then one may want to have a bias (though biases may be passsed in the "vertical" connections), and finally initiali states may also be learned and initialized in their own, peculiar way.
Similarly, one may want to learn the initial symbol in a generator (#384).
Thus I feel a need for a more generic initialization specification.
One approach is to extend Initializable._push_initialization_config to push all sorts of initialization methods (e.g. by pushing every attribute whose name ends in '_init') to its child bricks. then the child bricks will be able to puch it their children. As a downside this "pollutes" all bricks with initializations they may not need. I don't know how bad this polution is, since right now the Bias
brick also gets an unneeded weights_init attribute.
Another approach is to have a generic push_initialization_config method which handles the recursion itself:
global_push_initialization_config(top_brick, initializations):
if hasattr(brick,"_push_initialization_config"):
brick._push_initialization_config(initializations)
for k,v in initializations:
if hasattr(brick, k):
setattr(brick,k,v)
for c in brick.children:
global_push_initializaiton_config(c, initializations)
this has the benefit of not polluting the bricks that do not initialize anything with an initialization config.
Then any brick that wants to be initalized needs to have (possibly set to None) attributes xxx_init. This can be done in Initializable's init class (it can go thourgh kwargs looking for stuff that ends in '_init').
In a discussion with @janchorowski the following design was born:
-
Initializable
mixin adds a dictionary which maps parameter roles (WEIGHTS, BIASES, INITIAL_STATES, etc.) to initialization schemes. This will be later referred to as 'initialization dictionary'. -
_initialize
ofInitializable
method uses this dictionary to automatically initialize parameters. No need to implement_initialize
in custom bricks at all! If a scheme for a particular role is not given, a search in the role hierarchy is made to find the most specific role for which a scheme is set. E.g. if there is a role RECURRENT_WEIGHTS, the fallback role would be WEIGHTS - its
_push_initialization_config
methods simply propagates the initialization dictionary down the hierarchy - every brick that has a parameter must have a
parameter_roles
attribute, which says what will be the roles of the parameters of this brick. For higher levels brick such roles are collected recursively. This way every brick can have only relevant entries in its initialization dictionary. Theparameter_roles
attribute is a natural generalization ofhas_bias
- for entries of the initialization dictionary attribute based access should be also provided. Put it differently, we should keep
brick.weights_init
andbrick.biases_init
, not only for the sake of compatibility but also because it is convenient and also for another reason which I now feel lazy to explain.
@memimo , you might be interested.
A few comments:
- I kind of hate the
has_bias
/use_bias
duplication and it would be great if we could find a way to reconcile the two into one thing. It sounds like the parameter roles thing is the right way. - The parameter roles thing seems like it could be (and should be) computed using the variable filter. I think we want as few distinct attributes as possible to both avoid cognitive overhead and also avoid the possibility of them becoming inconsistent with each other.
- This business of searching the role hierarchy is fine along as we don't do anything with multiple inheritance... Then I think it could get messy.