Albert Zeyer

Results 300 issues of Albert Zeyer

And a test case which tests that together with masked-computation. See the test case for a demonstration of what should be possible with this. This is work-in-progress. I'm not sure...

Now that we have the generic `Tensor` and `TensorDict` to describe arbitrary data formats, we can remove the old ambiguous and limited `num_outputs` and `num_inputs` from the dataset and replace...

This becomes relevant for efficient decoupled weight decay implementation. If it is not decoupled, it's inefficient anyway.

TensorFlow

The `relative_positional_encoding` implementation in RC uses: ```python with nn.control_flow_ctx(None): ... ``` This is relevant for graph-based backends, once we have control flow logic like `nn.Cond` and `nn.Loop`. I wonder how...

returnn-frontend

Currently when `learning_rate_control_error_measure` (the error key) does not match exactly, it will use some heuristics. See `LearningRateControl.get_error_key`. This is also because the error key can change depending on whether there...

potential-new-behavior
good first issue
difficulty: medium

In the case of net dicts created by returnn_common, the construction heuristics (#1129) to resolve circular loops are never needed. For all recurrent layers (all layers accessed via `"prev:..."`), we...

good first issue
TensorFlow

See our multi-GPU training doc: https://returnn.readthedocs.io/en/latest/advanced/multi_gpu.html In case you do not have very fast direct connections between the GPUs (nvlink, only for the big professional cards), we always recommend async...

good first issue
difficulty: medium
TensorFlow

In principle, RETURNN supports arbitrary dtype, as `Data` can just have any `dtype`. However, many layers do not really allow to configure that. Most layers would just take the same...

TensorFlow

I'm not really sure this is a bug, or what we can really do about. However, I open this now because I noticed again a quite huge effect: ``` output/exp_fs_base/conformer_pre10_d384_h6_blstmf2_specaug_attdrop01_posdrop01_aux48_bhv14/recog_results_per_epoch/040...

TensorFlow

> A bit more meta: With all our logic for dim tags, which should actually make it easier to avoid any reshape problems or other shaping problems, why do we...