Albert Zeyer
Albert Zeyer
Another open question: One thing which is nice about `ChoiceLayer` is that it allows to define the decoder in a nice way both for recognition with search and for training...
@michelwi @jvhoffbauer you are probably also interested in this? You make heavy use of `custom_score_combine` with custom TF code. The goal here is to come up with some design which...
Yes, I do that a couple of times, and there are also some demos showing how to do that. You can simply flatten multiple dynamic axes into a single axis....
I agree, we should simplify it for the user as far as possible. `HDFDumpLayer` and `HDFDataset` should complement each other in this case. (We should be a bit careful to...
For reference, [here](https://github.com/rwth-i6/returnn/discussions/434) is some discussion relevant for this.
A note: I think our `Dataset` API itself is not really flexible enough. The `Dataset.get_data_shape` function currently implies that there is always exactly one dynamic time axis in the beginning....
Actually, not sure if the old behavior should be completely disallowed (so it needs a new `behavior_version` #508) or just deprecated.
So what do you suggest instead? No default at all, so require to be explicitly set? Or change the default to sth else? To what? Or just let it use...
@JackTemaki (or anyone): Did you actually ever compare directly? Can you really say that e.g. Adam or Nadam is better with 1e-8?
The logic for the logdir and TF event file writer is in `Runner.run`. It’s pretty simple currently. It was a design choice to have a separate logdir per dev/eval run,...