returnn Dim tags matching should use `allow_same_spatial

This is kind of an follow up to #666.

I have the problem currently, that SpatialDim("a", 1) matches SpatialDim("b", 1) in many cases, e.g. get_common_data, copy_compatible_to and find_matching_dim_map used by GatherLayer/DotLayer/etc.

Looking at the Dim.is_equal code, allow_same_spatial_dim matches exactly if

both tags are spatial dims (or feature if treat_feature_as_spatial which e.g. get_common_data sets)
the dimensions of self and other are equal, but not None

For my case, I definitely do not want these axes to match (they are not related in any way).

But this kind of matching is currently necessary for something like this:

      "lin1": {"class": "linear", "activation": "sigmoid", "n_out": 5, "from": "data:data"},
      "lin2": {"class": "linear", "activation": "sigmoid", "n_out": 5, "from": "data:data"},
      "output": {"class": "combine", "mode": "add", "from": ["lin1", "lin2"]},  # would give [B,T,F1,F2] with allow_same_spatial_dim=False!!

This is also because currently, LinearLayer does not set derived_from_tag on its output feature dim. Then, derived_matches would catch this. We can fix LinearLayer and to do this properly (currently this is via _base_get_out_data_from_opts, when creating a feature dim tag, I think it should just derive it from its input feature dim tag). For user specified out_dims however, we would not automatically set derived from to the input dim tag.

For other cases, this is more difficult maybe. E.g. if you have two inputs with different time axes, and then split into the same dim size. Then, the user would maybe want these axes to match, but they are in no way derived from the same thing.

Another solution is, similar to what @albertz wrote in #666, to separate user defined dim tags and automatically created ones. Then, user defined ones could have a more restrictive matching logic.

Dec 16 '21 19:12 Zettelkasten

derived_from_tag/derived_matches is not the right option here. derived_from_tag means that it is derived from this tag (via some op, like +- or so). We use it as a heuristic to assume in the rec layer, when some dim is derived (but unknown yet due to template construction, also this was before the dim math), that it is the same as the orig dim. E.g. we had the case where some people did some extra padding, then convolution with padding="valid" which removed the padding again such that it became the same, and that inside a rec layer, on the attention encoder axis.

The output feature dim is usually never the same as the input feature dim. And also, it is not derived from the input feature dim in any way. They are completely independent. So derived_from_tag doesn't really make sense here.

Dec 18 '21 02:12 albertz

Ah yeah, you're right, the derived_* logic is not applicable here.

So really, I think specifying n_out: int is the problem at the moment. In many cases currently, if n_in == n_out, we want the in and output dim tags to match. Like in my example above. Perhaps we could change it so that if n_out: int is specified, and n_out == in_dim.dimension, that then the newly created output dim tag will match with the input dim tag (either by being exactly the same, or via some other logic). I wonder if this most cases where we needed allow_same_spatial_dim before.

Logically, this does not make sense 100%, because the in and out dims should logically be different, but we matched them before always anyway, and in the future, we discourage anyone from using n_out: int anyway.

Jan 23 '22 13:01 Zettelkasten

Also the current matching logic is just dangerously broken and depends on the order of axes, e.g. this fails currently:

def test_same_spatial_dim():
  foo_dim = SpatialDim("foo", 3)
  bar_dim = SpatialDim("bar", 3)
  x = Data("a", dim_tags=[foo_dim, bar_dim])
  y = Data("b", dim_tags=[bar_dim, foo_dim])
  # test with default is_equal_opts=None
  assert_equal(x.find_matching_dim_map(y, other_axes=[0, 1]), {0: 1, 1: 0})  # returns {0: 0, 1: 1} currently

We should probably just at least throw an error here.

Jan 23 '22 13:01 Zettelkasten

Note that I stumbled upon this (or related) specifically for the DotLayer in #1154, and fixed via #1155. In #1155, I introduced a new behavior version to trigger the more strict matching for DotLayer. This is anyway only for the case with _auto_var_axes, i.e. when var1 and var2 are not specified explicitly.

But actually, I did not fully think through it, whether this is maybe difficult for users who did not adapt to use dim tags. Maybe not for DotLayer, as you can always specify everything explicitly.

Oct 24 '22 07:10 albertz

I'm now thinking whether my proposed solution from from #666 is maybe really the better way which would have avoided these issues:

As another solution, maybe we can disallow the broadcasting (or other is_equal attribs) if this is an explicit dim tag by the user? To implement that, we would need to go through all places in RETURNN where a static dim tag is created, and add a new flag like auto_created=True or so. And then allow the broadcasting only if auto_created and dimension == 1.

Again we should think about the quality (#634). When comparing a user-generated dim tag to another user-generated dim tag, it should be strict. When comparing an auto-created dim tag to another auto-created dim tag, it can use the is_equal attribs and be more relaxed. When comparing an auto-created dim tag to a user-created dim tag, what then? I think it should also be strict.

But yes, we also should think about having it all really well defined and the code should be clean. See #975.

Oct 24 '22 07:10 albertz

We have the auto_generated flag.

Nov 16 '22 16:11 albertz

returnn
returnn copied to clipboard

Dim tags matching should use `allow_same_spatial_dim=False`

returnn returnn copied to clipboard

Dim tags matching should use `allow_same_spatial_dim=False`

returnn
returnn copied to clipboard