Albert Zeyer
Albert Zeyer
#508 Effectively this means changing `Loss.get_default_target()` (because each loss class already can define its own default). The base function currently has this implementation: ``` @classmethod def get_default_target(cls, extern_data): """ :param...
We should maybe define more exactly the behavior when the user specifies `out_type` (which is currently a bit inconsistent also across layers; related is #541). In many cases, it would...
#508
#508 Thus effectively removing the default data key "data". I don't have a strong opinion on this, so I just put this here and leave it open for discussion. This...
I think this depends on #530. In case of `concat_sources=False`, the source names `data:0` etc are not nice, and I think we can have better ways (this is #530). (I'm...
It would crash at an early step in the first epoch with a message like: ``` ... pretrain epoch 1, step 59, cost:ctc 6.582720208647288, cost:output/output_prob 6.08799995325171, error:ctc 0.9999999632127583, error:decision 0.0,...
Existing configs should work as before, without any change in behavior. The datasets themselves (all what derives from class `Dataset`) will stay as is, as well as their API. It...
When the training crashes (e.g. GPU out-of-memory, or got inf/nan, or whatever), it often happens that the process (SGE job, Slurm job) is just hanging and not exiting.
See the [overview of distributed TensorFlow in general (independent of RETURNN)](https://github.com/rwth-i6/returnn/wiki/Distributed-TensorFlow) for some background. This issue here is about the specific implementation in RETURNN. This is somewhat orthogonal to the...