`get_out_data_from_opts` covers too much logic

Open albertz opened this issue 4 years ago • 0 comments

The idea was simple: based on the inputs/kwargs, determine the output Data type (without actually computing the tensor). This was mostly about dtype, shape and dim.

Over time, Data was extended by more and more logic such as handling beam search and special logic of the batch dim. Much of this extra logic is the same for every layer, and still we need to duplicate the code/logic in every layer.

Examples:

out.beam = SearchBeam.get_combined_beam(...)
(batch is already post-processed in TFNetwork._create_layer)
Explicit out_type or n_out might overwrite or explicitly set some attribs. (#542) This can be needed in recurrent constructions such as x: {class: eval, from: "prev:x", eval: "source(0) + 1"} where you might want to set some custom dtype or so. Currently only CopyLayer.get_out_data_from_opts and the base LayerBase.get_out_data_from_opts handle this. Layers like LinearLayer do not have an own get_out_data_from_opts because the base logic covers this.

We might want to decouple this logic:

One function which computes dtype, shape & dim & size_placeholder (or maybe dtype also separated). (In most cases, sizes (size_placeholder) would just be copied. In more rare cases, new sizes could be introduces, like ConvLayer etc. I'm not sure if this needs yet another separate logic.)
beam is almost always SearchBeam.get_combined_beam of all deps (inputs Data, layers, targets), except for layers like ChoiceLayer
batch is almost always BatchInfo.get_common_batch_info of all deps
One function which handles the logic of custom overwrites by out_type or n_out (#542)

I'm not exactly sure how it would look like. Maybe the function for dtype/shape/sizes could also just return a Data but not care about beam/batch.

Maybe there could then be separate functions LayerBase.get_out_beam and LayerBase.get_out_batch and only those layers which do sth non-standard would overwrite them.

This is a bit open for discussion. The main purpose is to simplify the code, to make it more straight-forward, and to make it more consistent for edge cases.

E.g. currently when you specify out_type, some layers would just ignore it, some layers would at least check it, some layers would use the information to overwrite the output. (#542)

Jun 12 '21 12:06 albertz