returnn
returnn copied to clipboard
Dim tags matching should use `allow_same_spatial_dim=False`
This is kind of an follow up to #666.
I have the problem currently, that SpatialDim("a", 1) matches SpatialDim("b", 1) in many cases, e.g. get_common_data, copy_compatible_to and find_matching_dim_map used by GatherLayer/DotLayer/etc.
Looking at the Dim.is_equal code, allow_same_spatial_dim matches exactly if
- both tags are spatial dims (or feature if
treat_feature_as_spatialwhich e.g.get_common_datasets) - the dimensions of
selfandotherare equal, but notNone
For my case, I definitely do not want these axes to match (they are not related in any way).
But this kind of matching is currently necessary for something like this:
"lin1": {"class": "linear", "activation": "sigmoid", "n_out": 5, "from": "data:data"},
"lin2": {"class": "linear", "activation": "sigmoid", "n_out": 5, "from": "data:data"},
"output": {"class": "combine", "mode": "add", "from": ["lin1", "lin2"]}, # would give [B,T,F1,F2] with allow_same_spatial_dim=False!!
This is also because currently, LinearLayer does not set derived_from_tag on its output feature dim.
Then, derived_matches would catch this.
We can fix LinearLayer and to do this properly (currently this is via _base_get_out_data_from_opts, when creating a feature dim tag, I think it should just derive it from its input feature dim tag).
For user specified out_dims however, we would not automatically set derived from to the input dim tag.
For other cases, this is more difficult maybe. E.g. if you have two inputs with different time axes, and then split into the same dim size. Then, the user would maybe want these axes to match, but they are in no way derived from the same thing.
Another solution is, similar to what @albertz wrote in #666, to separate user defined dim tags and automatically created ones. Then, user defined ones could have a more restrictive matching logic.
derived_from_tag/derived_matches is not the right option here. derived_from_tag means that it is derived from this tag (via some op, like +- or so). We use it as a heuristic to assume in the rec layer, when some dim is derived (but unknown yet due to template construction, also this was before the dim math), that it is the same as the orig dim. E.g. we had the case where some people did some extra padding, then convolution with padding="valid" which removed the padding again such that it became the same, and that inside a rec layer, on the attention encoder axis.
The output feature dim is usually never the same as the input feature dim. And also, it is not derived from the input feature dim in any way. They are completely independent. So derived_from_tag doesn't really make sense here.
Ah yeah, you're right, the derived_* logic is not applicable here.
So really, I think specifying n_out: int is the problem at the moment. In many cases currently, if n_in == n_out, we want the in and output dim tags to match. Like in my example above.
Perhaps we could change it so that if n_out: int is specified, and n_out == in_dim.dimension, that then the newly created output dim tag will match with the input dim tag (either by being exactly the same, or via some other logic).
I wonder if this most cases where we needed allow_same_spatial_dim before.
Logically, this does not make sense 100%, because the in and out dims should logically be different, but we matched them before always anyway, and in the future, we discourage anyone from using n_out: int anyway.
Also the current matching logic is just dangerously broken and depends on the order of axes, e.g. this fails currently:
def test_same_spatial_dim():
foo_dim = SpatialDim("foo", 3)
bar_dim = SpatialDim("bar", 3)
x = Data("a", dim_tags=[foo_dim, bar_dim])
y = Data("b", dim_tags=[bar_dim, foo_dim])
# test with default is_equal_opts=None
assert_equal(x.find_matching_dim_map(y, other_axes=[0, 1]), {0: 1, 1: 0}) # returns {0: 0, 1: 1} currently
We should probably just at least throw an error here.
Note that I stumbled upon this (or related) specifically for the DotLayer in #1154, and fixed via #1155.
In #1155, I introduced a new behavior version to trigger the more strict matching for DotLayer. This is anyway only for the case with _auto_var_axes, i.e. when var1 and var2 are not specified explicitly.
But actually, I did not fully think through it, whether this is maybe difficult for users who did not adapt to use dim tags. Maybe not for DotLayer, as you can always specify everything explicitly.
I'm now thinking whether my proposed solution from from #666 is maybe really the better way which would have avoided these issues:
As another solution, maybe we can disallow the broadcasting (or other
is_equalattribs) if this is an explicit dim tag by the user? To implement that, we would need to go through all places in RETURNN where a static dim tag is created, and add a new flag likeauto_created=Trueor so. And then allow the broadcasting only ifauto_created and dimension == 1.
Again we should think about the quality (#634). When comparing a user-generated dim tag to another user-generated dim tag, it should be strict. When comparing an auto-created dim tag to another auto-created dim tag, it can use the is_equal attribs and be more relaxed. When comparing an auto-created dim tag to a user-created dim tag, what then? I think it should also be strict.
But yes, we also should think about having it all really well defined and the code should be clean. See #975.
We have the auto_generated flag.