returnn `Dim` internals and API should be refactored

trafficstars

The logic involving get_for_batch_ctx and get_same_base is ugly, unintuitive, and a potential source of problems.

One big problem is that we never really defined the logic around get_for_batch_ctx well, and it was added only at a later point. And similarly for some of the other concepts. E.g. the equality of dim tags (#634) is also not well defined.

CumConcatLayer (#589) and generalized self-attention (#391) were one of the main reasons which introduced this, but then also the beam search logic.

CumConcatLayer also introduced the concept of implicit dims.

Defining a dim tag (sequence lengths) inside a loop, and then having it accumulated outside is still somewhat straightforward and non-ambiguous, so this is not really a problem.

Maybe it was a problem to treat them as the same though? But I think this is important such that the rec loop optimization (move out of loop) works correctly.

Note also that the whole logic around get_for_batch_ctx is basically just for having different dyn_size_ext (dyn_size) for the same dim tag, under different conditions, such as inside a loop or outside, and with beam or without.

Direct assignments or reads from dyn_size or dyn_size_ext, but also all the other flags, even description, depends on get_for_batch_ctx or get_same_base.

Some older code ignores get_for_batch_ctx or get_same_base and directly accesses (reads or writes) any of the other flags like dyn_size. Which works when it is the correct instance. But otherwise it can lead to unexpected behavior.

Related is also declare_same_as, although this might not be much a problem, even less after such refactoring. However, its logic currently is quite complicated, and should be simplified.

I'm also not sure about a good way to solve this. Maybe dyn_size and dyn_size_ext should be hidden away, and only be accessible through functions get_... and set_..., which would also require the batch and ctx.

Another big problem is the special role of the batch dim (#920), and all the extra logic around BatchInfo, and when Data.batch or Dim.batch should be set, and what it actually should be set to. In principle, we should not need any special logic for the batch dim, and it should be treated just like other dims.

One feature which is available for the batch dim (BatchInfo) but not for other dims is the flattening logic, and also the special beam search logic.

I think the flattening should be done in a way that you could combine multiple dims (but any dims) and you would get a new flattened dim. Nothing would be specific about the batch dim. There would be meta info attached to the combined dim to be able to recover the individual dims.

The beam should just be another separate dim, not merged into the batch dim. And then there could also be meta information attached to it, what we basically have in SearchBeam.

Related is also the definition of dim tag equality (#634). This is still not well defined in all cases.

A bit related is also dim tag math, and esp questions on equality in those cases. However, I think this was not too much of a problem so far, except that the equality logic was also broken in its own ways in those cases.

Further, there is also the derived_from_tag and derived_from_op logic, which is yet another heuristic for certain equality matching. Maybe this is not needed when dim tags are used everywhere consistently.

And there is also is_dim_known, undefined, which are also not so well defined.

Such changes might break some older code. But on RETURNN side, this can all be fixed. And there should not be much (or any) external code yet using this.

Some examples of issues and resulting PRs caused due to the API of Dim, where things were not really well defined:

#666
#865
#1046
#1054 and #1055
#1057 and #1058
#1069 and #1068
#1102 and #1104
#1107
#1112
#1114
#1151
#1152
#1167 and #1168
#1246

Related issues:

#1153
#920
#634
#589
#391

Mar 03 '22 09:03 albertz

Note, in case we try to address this, i.e. clean up or refactor some of this: We should also check and run the test cases of returnn-common, as there are some problems which might only occur via those tests.

Sep 01 '22 15:09 albertz