blocks icon indicating copy to clipboard operation
blocks copied to clipboard

Initializable _push_initialization_config

Open lukemetz opened this issue 10 years ago • 19 comments

A frequent use case I have when using blocks involves creating some graph of bricks, (most of which recursive), and needing to fine tune only a few of the children bricks weights. The workflow suggested by the docs is to set the top level initialization and then call push_initialization_config then manually go in and set the required weights. This seems like spreading out the configuration across multiple different locations in a code base as well as leaving no easy way to create prefab bricks with initializations already set inside them. In addition to this, the write over nature seems unintuitive, as there is no warning when a user manually sets weights_init and sees no effect. For example:

class MultiBrick(Initializable):
    def __init__(self, **kwargs):
         self.lin1 = Linear(weights_init=Constant(1))
         self.lin2 = Linear()
         ......
b = MultiBrick(self, weights_init=Constant(0), bias=Constant(0))
b.initialize()

I would expect b.lin1.weights_init to be Constant(1) but this is not the case. In this case its easy enough to set it manually post push_initialization_config but this is not true the larger the graphs get.

As a solution, could the Initializable._push_initialization_config the first check if the child.weights_init exists and if it does not overwrite and leave the original value? I quick look at the source says Initialization is the only class using _push_initialization_config.

Thanks for your consideration.

lukemetz avatar Jul 24 '15 15:07 lukemetz

+1

sotelo avatar Jul 28 '15 19:07 sotelo

Hi Luke,

it seems to me that what you that the following should work for you:

class MultiBrick(Initializable):

  def _push_initialization_config(self):
       super(self, MultiBrick)._push_initialization_config(self)
       self.lin1.weights_init = Constant(1)
       self.lin1._push_initialization_config()

Does this make sense?

Dima

On 24.07.2015 18:53, Luke wrote:

A frequent use case I have when using blocks involves creating some graph of bricks, (most of which recursive), and needing to fine tune only a few of the children bricks weights. The workflow suggested by the docs is to set the top level initialization and then call |push_initialization_config| then manually go in and set the required weights. This seems like spreading out the configuration across multiple different locations in a code base as well as leaving no easy way to create prefab bricks with initializations already set inside them. In addition to this, the write over nature seems unintuitive, as there is no warning when a user manually sets weights_init and sees no effect. For example:

|class MultiBrick(Initializable): def init(self, **kwargs): self.lin1 = Linear(weights_init=Constant(1)) self.lin2 = Linear() ...... b = MultiBrick(self, weights_init=Constant(0), bias=Constant(0)) b.initialize() |

I would expect b.lin1.weights_init to be Constant(1) but this is not the case. In this case its easy enough to set it manually post |push_initialization_config| but this is not true the larger the graphs get.

As a solution, could the |Initializable._push_initialization_config| the first check if the |child.weights_init| exists and if it does not overwrite and leave the original value? I quick look at the source says |Initialization| is the only class using |_push_initialization_config|.

Thanks for your consideration.

— Reply to this email directly or view it on GitHub https://github.com/mila-udem/blocks/issues/773.

rizar avatar Jul 30 '15 16:07 rizar

@rizar That was the missing link. Should have really thought of that. I retract my original idea.

Thanks for the help.

lukemetz avatar Jul 30 '15 18:07 lukemetz

@rizar's solution works in this case but in general I would prefer @lukemetz's proposal. If you didn't have control over MultiBrick's constructor you would still want to be able to specialize its children's initialization configuration.

cooijmanstim avatar Jul 30 '15 20:07 cooijmanstim

@koningrobot , definitely, but in this case you just need to push MultiBrick induced initialization first and override it:

brick = MultiBrick(weights_init=Constant(0))
brick.push_initialization_config()
brick.lin1.weights_init = Constant(1)

rizar avatar Jul 31 '15 16:07 rizar

Sure, that works, but I'm talking in general. Initialization should be configurable e.g. from a YAML file with hyperparameters. It shouldn't require special cases in code.

cooijmanstim avatar Aug 02 '15 01:08 cooijmanstim

This keeps coming back to bite me. I think not overriding children's initialization configuration as suggested by @lukemetz is the obviously right thing to do. It would silently change the behavior of existing code, but I can't imagine anyone is actually relying on the parent destroying the child's configuration.

cooijmanstim avatar Aug 18 '15 20:08 cooijmanstim

Actually, it was considered at an early stage of Blocks development, where it was only me and Bart sending each other long emails. And I remember that Bart convinced that this idea is wrong and _push_initialization_config should just push configuration, without checking whether or not it had been set. I will try to find these old emails and remember what's wrong with checking if the child initialization has already been configured.

rizar avatar Aug 18 '15 20:08 rizar

Reading this thread up until now I tend to think that push_initialization_config should grow a 'force' boolean keyword that is False by default. initialize() should then do a "soft push" which doesn't overwrite existing configuration if it is already specified.

Of course, MLP (I still hate that name) will still have no way to do layer-specific initialization declaratively, since all of the Linear bricks are created implicitly and there is only a single initializer for weights and biases.

I'd be interested to hear @bartvm's counter-argument against a "soft push" being the default because I cannot think of one.

dwf avatar Aug 18 '15 20:08 dwf

Oh, I am afraid keyword would be an overkill. We should probably just decide which of two is better.

Trying to play devil's advocate: what if I have a brick class A which typically has children of brick class B, and A._push_initialization_config() sets a bunch of B's initialization related attributes. Should A._push_initialization_config() check that any of them was set or all of them?

Perhaps a more realistic question: if the user set only weights_init of a Linear child of an MLP, should we treat as a mistake (since biases_init is not set) or as a feature, allowing "partial" configuration push?

rizar avatar Aug 18 '15 21:08 rizar

Either way, I think it is clearly an "explicit is better than implicit". If you are leaving something alone that's there, that's fine. If you are clobbering something that the user put there then you should not do it unless the user has explicitly asked you to (hence the 'force'). I can imagine situations where you'd want that, actually, but not as a default.

Personally I am having fever dream flashbacks to set_input_space() in pylearn2 resetting parameters as an unadvertised side effect. Not cool.

dwf avatar Aug 18 '15 21:08 dwf

I would have to dig up the old e-mails, but I'm not too enthusiastic. @rizar's comment already touches on what I think would be problematic, each _push_initialization_config method would end up looking like this:

if child.weights_init is NoneInitialization or force is true:
    child.weights_init = self.weights_init
if child.biases_init is NoneInitialization or force is true:
    child.biases_init = self.biases_init
if ...:
    ....

This will make the code a lot more complex it seems. There is also no way of guaranteeing that custom bricks obey this behaviour.

I can see how the current behaviour can be confusing, but in practice you can do anything you want, as @rizar pointed out in https://github.com/mila-udem/blocks/issues/773#issuecomment-126737200. I'm not sure if the small improvement in usability is worth the cost in code complexity and decreasing the ease with which new bricks can be written.

bartvm avatar Aug 18 '15 21:08 bartvm

Either way, I think it is clearly an "explicit is better than implicit".

OK, I see your point that user receives a very unpleasant surprise when initialize overrides his fine configuration of children bricks. Even if we stick to the current scheme, something has to be done to reduce the shock here.

rizar avatar Aug 18 '15 21:08 rizar

@bartvm How would you feel about simply refusing to overwrite anything that isn't NoneInitialization? That's already an improvement that does away with a big "gotcha".

dwf avatar Aug 18 '15 21:08 dwf

How would we support the force operator then? Not allowing overriding could be done by just taking the attributes from the lazy decorator and turning them into properties somehow that check for NoneInitialization, but we clearly need some sort of force option (as pointed out, sometimes a parent brick that you don't have control over sets configuration settings that you want to override later).

bartvm avatar Aug 18 '15 21:08 bartvm

I have a possible solution that would keep the complexity out of each Brick's _push_allocation_config. What if we defined these methods on Brick:

def _pull_config(self, _sentinel, **kwargs):
    pulled = []
    for key, value in kwargs.items():
        if getattr(self, key) is _sentinel:   # Maybe getattr(self, key, _sentinel)?
            setattr(self, key, value)
            pulled.append(key)
    return pulled

def pull_allocation_config(self, **kwargs):
    r"""Called by a parent brick wishing to push allocation configuration options.

    Parameters
    ----------
    \*\*kwargs
        Keyword arguments match names of attributes on this brick
        corresponding to quantities necessary for allocation.

    Returns
    -------
    pulled : list
        A list of the keyword arguments that were assigned as attributes.
        Those that weren't had values other than `NoneAllocation` already
        set on the brick.

    """
    return self._pull_config(_sentinel=NoneAllocation, **kwargs)

def pull_initialization_config(self, **kwargs):
    r"""Called by a parent brick wishing to push initialization configuration options.

    Parameters
    ----------
    \*\*kwargs
        Keyword arguments match names of attributes on this brick
        corresponding to quantities necessary for initialization.

    Returns
    -------
    pulled : list
        A list of the keyword arguments that were assigned as attributes.
        Those that weren't had values other than `NoneInitialization` already
        set on the brick.

    """
    return self._pull_config(_sentinel=NoneInitialization, **kwargs)

Establish a convention of having _push_allocation_config set whatever it needs to set by calling child.pull_allocation_config(foo=bar, baz=bip), etc.

Custom bricks used as children will inherit this behaviour automatically. Custom compound bricks need only follow the convention set by the ones in the library to inherit this much less annoying behaviour. I proposed calling it 'pull' as a counterbalance to 'push', but I'd be open to other names if that's not a good one.

dwf avatar Oct 16 '15 05:10 dwf

Hmm. I just realized the above solution won't work. Stuff that has been explicitly specified is indistinguishable from stuff that just has a default value in the child brick. The only way this would be workable is if we somehow made Bricks keep track of whether a value had been explicitly passed at construction, which you can do by inspecting args and kwargs in a decorator around __init__, but that's getting quite complicated.

dwf avatar Oct 16 '15 07:10 dwf

On the other hand, maybe it isn't so complicated...

dwf avatar Oct 16 '15 07:10 dwf

There's another way to make this work: add a 'default' parameter to LazyNone and make default constructor arguments needed for allocation/initialization be NoneAllocation(actual_default) and NoneInitialization(actual_default). You could do this via a decorator using inspect.getfullargspec to extract the function's default arguments, and just have the wrapper do kwargs.setdefault(key, NoneAllocation(extracted_default_value_for_key)). Explicit specification of these arguments, either positionally or by keyword, thus precludes this wrapping. At allocate() or initialize() time, these wrapper objects are discarded.

dwf avatar Oct 16 '15 08:10 dwf