Albert Zeyer
Albert Zeyer
> > When you modify the batch dim, you should create a new `BatchInfo` object as well, and assign that to `output`. > > As I said, the fix is...
> Do you have another layer which modifies the batch axis and could serve as a good example? Not many layers do that. I just recall `FlattenBatchLayer` right now.
Well, GatherLayer on batch axis is still maybe sometimes a valid thing someone wants to do. I would leave this PR open.
We should not introduce another separate padding concept. This was already discussed as part of #391, where we agreed to add sth like `dyn_mask_ext` (earlier `seq_mask_ext` in that discussion) to...
Btw, before you implement some bigger change like this, it would be good to open an issue first where the implementation details are being discussed.
One way this would already work without these changes (at least conceptually): You could do one pass over the given prefix with search disabled (so `ChoiceLayer` uses the prefix). Then...
Btw, also conceptually, what we want in the future is that such things can be written more directly in the config, in a flexible way, without the need to modify...
> About two decoding passes: Yes, I was aware of that concept, but then I would need to take the graph apart and write a custom search, with two session...
> The two-decoder implementation makes the config unnecessarily complicated. Why? In my example above, it basically is one additional line of code. > And by config I mean the network...
This here is a case where we do not necessarily need a new `behavior_version` and instead could just print a deprecation warning.