returnn icon indicating copy to clipboard operation
returnn copied to clipboard

Test ReuseParams with different variable names

Open Zettelkasten opened this issue 4 years ago • 2 comments

The test case from #447, currently failing.

Zettelkasten avatar Feb 19 '21 23:02 Zettelkasten

So, we basically have outlined a possible solution for this in the discussion in #447.

Basically, what we want (self being the layer):

  • Let's assume you have param = self.add_param(tf.get_variable(name)) (or sth like that).
  • param is what you would directly use in this code, for whatever purpose, e.g. matmul in LinearLayer.
  • self.params[name] is param should be true after this call. This implies:
    • self.params includes potential non-owned vars (shared/reuse) and/or potential transformations (transpose, weight noise, weight norm, etc) (i.e. not the original variable, but just some tensor).
  • Weight noise, weight norm and other transformations, and also L2 losses etc, should only be applied if the tensor or variable is owned by the layer (determined by looking at the namespace).

So we need to change the logic of add_param to implement this logic for self.params. The difficulty is to infer the name. This should be doable by extending var_creation_scope:

  • In case there is no reuse/sharing or transformation, no custom getter, let's not change anything. Then add_param would also keep the code for this simple case.
  • With a custom getter, we can grab name, and store it somewhere.

We might need a separate way to get a list of variables of a layer. (Or do we really? What would be the use case?) Or we could also simply iterate through all existing variables (via the global collections) and check for the namespace if we need to filter.

Btw, you might ask, why do we need self.add_param(tf.get_variable(name)), and why not simply do sth like self.create_param(name, ...) instead, and not use tf.get_variable. This is because we want to be able to use other TF code, which makes use of tf.get_variable. E.g. when we use the original LSTM by TF, or other things. We want to support that. (Btw, if this was unclear, you might also extend the documentation in this PR for that.)

Actually, when this is external code, note that it could create multiple variables, i.e. multiple calls to get_variable. We should be careful to handle that correctly. Maybe the whole logic of adding it to self.params should be in var_creation_scope. Or we can keep it in add_param but call to that in var_creation_scope.

Are you going to try to implement this? (I'm somewhat short on time right now, not sure when I get to this.)

albertz avatar Feb 22 '21 10:02 albertz

Are you going to try to implement this? (I'm somewhat short on time right now, not sure when I get to this.)

Thanks for your detailed notes! Yes, I can certainly give it a try implementing this. If I run into issues/open questions I of course will let you know. Thanks!

Zettelkasten avatar Feb 22 '21 15:02 Zettelkasten