Albert Zeyer
Albert Zeyer
And a test case which tests that together with masked-computation. See the test case for a demonstration of what should be possible with this. This is work-in-progress. I'm not sure...
Now that we have the generic `Tensor` and `TensorDict` to describe arbitrary data formats, we can remove the old ambiguous and limited `num_outputs` and `num_inputs` from the dataset and replace...
This becomes relevant for efficient decoupled weight decay implementation. If it is not decoupled, it's inefficient anyway.
The `relative_positional_encoding` implementation in RC uses: ```python with nn.control_flow_ctx(None): ... ``` This is relevant for graph-based backends, once we have control flow logic like `nn.Cond` and `nn.Loop`. I wonder how...
Currently when `learning_rate_control_error_measure` (the error key) does not match exactly, it will use some heuristics. See `LearningRateControl.get_error_key`. This is also because the error key can change depending on whether there...
In the case of net dicts created by returnn_common, the construction heuristics (#1129) to resolve circular loops are never needed. For all recurrent layers (all layers accessed via `"prev:..."`), we...
See our multi-GPU training doc: https://returnn.readthedocs.io/en/latest/advanced/multi_gpu.html In case you do not have very fast direct connections between the GPUs (nvlink, only for the big professional cards), we always recommend async...
In principle, RETURNN supports arbitrary dtype, as `Data` can just have any `dtype`. However, many layers do not really allow to configure that. Most layers would just take the same...
I'm not really sure this is a bug, or what we can really do about. However, I open this now because I noticed again a quite huge effect: ``` output/exp_fs_base/conformer_pre10_d384_h6_blstmf2_specaug_attdrop01_posdrop01_aux48_bhv14/recog_results_per_epoch/040...
> A bit more meta: With all our logic for dim tags, which should actually make it easier to avoid any reshape problems or other shaping problems, why do we...