blocks
blocks copied to clipboard
[WIP] Managing initilizations via a role scheme dictionary
fixes #970 Re-attempt of #1021 So far it only changes simple.py will propagate changes to other bricks when the format is approved.
Technically this is @rizar's review but hopefully my comments from the peanut gallery are a little helpful to at least one of you :stuck_out_tongue:
@rizar could you please take a look at the high level structure of the PR. I'll take care of the comments of dwf when the high-level is approved.
Sure! Will work on it today.
Sorry for the delay, I am struggling with the cluster and my course project. I will definitely review it tomorrow.
I have three suggestions:
- use the roles as the keys in
initialization_schemes
- initialize parameters using the scheme given for their most specific role
- check that the given initialization schemes match the parameter roles of the brick in two places: in the end of constructor and in
_push_initialization_config
. This will allow to detect errors as early as possible.
I reorganized the PR a little bit. Again et's discuss the generality of the PR first, I will fix formatting stuff later.
Some notes:
1-The problem with having the roles as keys is that, roles are objects, and different objects of same class will be hashed to different values. So instead I use the name of the roles as the keys. 2-One major API change is that, if someone wants to manually change the schemes self.weights_init and self.biases_init is not valid anymore and one has to use self.initializations_schems 3-There is a tiny bit difference in the GRU weight initialization values now. I don't believe that will have any practical impact on performance, unless someone has an evidence against it.
- I think you gave up the idea of a role-indexed dictionary too early. Check https://docs.python.org/2/reference/datamodel.html#object.hash , you should be able to overload this method.
- I thought we would support
weights_init
andbiases_init
through__getattr__
, why not?
Indeed,
def __hash__(self):
return hash(str(self))
should work in the case of roles.
I think the major design decision that I disagree with is having both initializable_roles
and parameter_roles
. Neither of the two is fully suitable for controlling how the initialization schemes are spread down the hierarchy. The former is hardcoded, and the latter is known too late, after the parameters are created.
Here is a different design which seems better to me
- each
Initilizable
brick has an attributeparameter_roles
that contains all the roles that the brick's parameters will have when constructed - the value of this attribute is defined as follows:
- first,
Initializable
sets it to[WEIGHT, BIAS]
(only[WEIGHT]
is notuse_bias
) - ... unless the user provides
parameter_roles
as a constructor argument - then, after
super(Initializable, self).__init__
,Initializable._collect_roles
is called to add the roles of the children toparameter_roles
- first,
- we are still in
Initializable.__init__
, but we can already check thatinitialization_schemes
does not contain schemes for the roles that are not inparameter_roles
- furthermore, when the parameters are created (that is in
allocate
), we can check that all the parameters have roles that are present inparameter_roles
- when initialization configuration is pushed (
Initializable._push_initialization_config
), only the schemes for the relevant roles should be pushed to the children. One should be careful here to check that sometimes initialization schemes for roles that are parents of the precise parameter roles should also be pushed. - in
initialize
, we can repeat the sanity check that we did in constructor, to make sure that user did not provide useless initialization schemes
What do you think?
@memimo , we should resume working on this, and I really hope we can finish it until I leave Montreal on June 8.
@rizar could you take a look about the high level design choice? Travis is falling on block_examples code, because some initilization values have changed. I will take care of that later, if the design is good.
@rizar ok, so this version collect roles from parameter tags. Will rebase later on.
@rizar I updated based on your latest comments.