returnn icon indicating copy to clipboard operation
returnn copied to clipboard

Layers applying a mask should expand input to all `dyn_size_ext` dimensions

Open Zettelkasten opened this issue 4 years ago • 5 comments

Consider I have some input of shape [T], where T has dyn_sizes with some other axes, e.g. [B]. I want to use ReduceLayer on this axis, which needs to apply a sequence mask. Currently, this fails because we assert that all axes in dyn_size_ext need to be present in the input.

Instead, I would like to support this by first expanding the input to all dims from dyn_size_ext that are missing. So in my case, we would first broadcast [T] to [B,T], then apply the sequence mask, reduce the axis, and get some output [B,T].

Zettelkasten avatar Oct 19 '21 11:10 Zettelkasten

I wonder whether this can be slow and suboptimal in some cases.

E.g. in DotLayer, you definitely would not want to do that, when the axis is present in one of the inputs.

albertz avatar Oct 19 '21 15:10 albertz

What are the layers which would do this? ReduceLayer, DotLayer. Others?

albertz avatar Oct 19 '21 15:10 albertz

I wonder whether this can be slow and suboptimal in some cases.

E.g. in DotLayer, you definitely would not want to do that, when the axis is present in one of the inputs.

For DotLayer, we would always prefer to mask an input that already has all needed dims, and only use this in the case where neither input has all required dims. Performance should be the same as before.

Maybe, if you think this can happen by accident, we can also make this explicit, similar to #691.

What are the layers which would do this? ReduceLayer, DotLayer. Others?

Also SeqLenMaskLayer, and related to that SoftmaxOverSpatialLayer. In general, all that use Data.get_sequence_mask_broadcast in some way. Also SearchSortedLayer (but not so important, probably I'm the only one using that layer).

Zettelkasten avatar Oct 19 '21 21:10 Zettelkasten

I guess the list of layers with special behavior on dynamic spatial axes is basically exactly this.

I'm not sure if we need to implement it for all of those now though. It should be fine when there is proper error handling (i.e. some exception is thrown, with some reference to this issue).

This is already the case for all layers using get_sequence_mask_broadcast. And I just added a reference in the comment to this issue.

So, what's left? Otherwise I think we can close this, and just implement it only when it is really needed.

You said you need it for ReduceLayer? Maybe you can just create a PR? I think you just need to call copy_add_dim_by_tag with unbroadcast=True for the missing dim tags. Maybe we can add a helper like copy_add_missing_dim_tags or so.

albertz avatar Nov 21 '21 16:11 albertz

Now that we disallow implicit broadcasting in #691, I think it makes sense to also enforce this here. I.e. only allow to broadcast to the implicit size dims if the user specifies out_shape.

I don't need this anymore currently. I'll come back to this when it is an issue for me.

Zettelkasten avatar Jan 23 '22 13:01 Zettelkasten