blocks icon indicating copy to clipboard operation
blocks copied to clipboard

[WIP] Managing initilizations via a role scheme dictionary

Open memimo opened this issue 8 years ago • 13 comments

fixes #970 Re-attempt of #1021 So far it only changes simple.py will propagate changes to other bricks when the format is approved.

memimo avatar Mar 15 '16 18:03 memimo

Technically this is @rizar's review but hopefully my comments from the peanut gallery are a little helpful to at least one of you :stuck_out_tongue:

dwf avatar Mar 15 '16 20:03 dwf

@rizar could you please take a look at the high level structure of the PR. I'll take care of the comments of dwf when the high-level is approved.

memimo avatar Apr 05 '16 14:04 memimo

Sure! Will work on it today.

rizar avatar Apr 05 '16 14:04 rizar

Sorry for the delay, I am struggling with the cluster and my course project. I will definitely review it tomorrow.

rizar avatar Apr 06 '16 22:04 rizar

I have three suggestions:

  • use the roles as the keys in initialization_schemes
  • initialize parameters using the scheme given for their most specific role
  • check that the given initialization schemes match the parameter roles of the brick in two places: in the end of constructor and in _push_initialization_config. This will allow to detect errors as early as possible.

rizar avatar Apr 07 '16 17:04 rizar

I reorganized the PR a little bit. Again et's discuss the generality of the PR first, I will fix formatting stuff later.

Some notes:

1-The problem with having the roles as keys is that, roles are objects, and different objects of same class will be hashed to different values. So instead I use the name of the roles as the keys. 2-One major API change is that, if someone wants to manually change the schemes self.weights_init and self.biases_init is not valid anymore and one has to use self.initializations_schems 3-There is a tiny bit difference in the GRU weight initialization values now. I don't believe that will have any practical impact on performance, unless someone has an evidence against it.

memimo avatar Apr 12 '16 19:04 memimo

  1. I think you gave up the idea of a role-indexed dictionary too early. Check https://docs.python.org/2/reference/datamodel.html#object.hash , you should be able to overload this method.
  2. I thought we would support weights_init and biases_init through __getattr__, why not?

rizar avatar Apr 13 '16 17:04 rizar

Indeed,

def __hash__(self):
    return hash(str(self))

should work in the case of roles.

dwf avatar Apr 13 '16 18:04 dwf

I think the major design decision that I disagree with is having both initializable_roles and parameter_roles. Neither of the two is fully suitable for controlling how the initialization schemes are spread down the hierarchy. The former is hardcoded, and the latter is known too late, after the parameters are created.

Here is a different design which seems better to me

  • each Initilizable brick has an attribute parameter_roles that contains all the roles that the brick's parameters will have when constructed
  • the value of this attribute is defined as follows:
    • first, Initializable sets it to [WEIGHT, BIAS] (only [WEIGHT] is not use_bias)
    • ... unless the user provides parameter_roles as a constructor argument
    • then, after super(Initializable, self).__init__, Initializable._collect_roles is called to add the roles of the children to parameter_roles
  • we are still in Initializable.__init__, but we can already check that initialization_schemes does not contain schemes for the roles that are not in parameter_roles
  • furthermore, when the parameters are created (that is in allocate), we can check that all the parameters have roles that are present in parameter_roles
  • when initialization configuration is pushed (Initializable._push_initialization_config), only the schemes for the relevant roles should be pushed to the children. One should be careful here to check that sometimes initialization schemes for roles that are parents of the precise parameter roles should also be pushed.
  • in initialize, we can repeat the sanity check that we did in constructor, to make sure that user did not provide useless initialization schemes

What do you think?

rizar avatar Apr 25 '16 01:04 rizar

@memimo , we should resume working on this, and I really hope we can finish it until I leave Montreal on June 8.

rizar avatar May 23 '16 23:05 rizar

@rizar could you take a look about the high level design choice? Travis is falling on block_examples code, because some initilization values have changed. I will take care of that later, if the design is good.

memimo avatar Jun 01 '16 18:06 memimo

@rizar ok, so this version collect roles from parameter tags. Will rebase later on.

memimo avatar Jun 02 '16 18:06 memimo

@rizar I updated based on your latest comments.

memimo avatar Jun 06 '16 16:06 memimo