Albert Zeyer issues

Results 300 issues of


                                            Albert Zeyer

RF scaled_dot_product_attention

Add `scaled_dot_product_attention` as a function to RF, and use it in our attention code. (Does this also work with `RelPosSelfAttention`?) In case of PyTorch, wrap `torch.nn.functional.scaled_dot_product_attention`. That should be much...

SlowMo (BMUF) support for PyTorch distributed training

This is for the parameter averaging method in distributed training. The SlowMo method adds an additional momentum which is used for the outer loop updates (i.e. after param averaging). *...

PyTorch

MultiGPU

RF weight dropout and variational noise

Currently we don't have weight dropout in the RF. We should add it. (I thought there was an issue already about it but I don't find it.) Related: * Weight...

returnn-frontend

Tensor deepcopy does not copy raw_tensor

`deepcopy` on `Tensor` will not copy the `raw_tensor`: ```python def __getstate__(self): d = {k: getattr(self, k) for k in self.__slots__} d["_raw_tensor"] = None # do not store the TF tensors...

RF BatchNorm running var small diff between TF-layers, pure RF and direct PyTorch, biased vs unbiased

There are multiple implementations of batch norm, but here, three different cases are relevant: 1. The pure RF implementation (which is used e.g. when `use_mask=True`) 2. RF TF-layers backend, via...

PyTorch debug_add_check_numerics_ops

We could use such code: ```python import torch from torch.utils._pytree import tree_all from torch.utils._python_dispatch import TorchDispatchMode class NaNInfMode(TorchDispatchMode): enabled: bool def __init__(self, enabled=True): super().__init__() self.enabled = enabled @staticmethod def check_finite(pytree):...

PyTorch

RuntimeError: CUDA error: unknown error

It's likely a hardware issue. Similarly, there is also #1465. I just want to report this for future reference. Multi GPU training (but that's likely not relevant), log (`/work/asr4/zeyer/setups-data/combined/2021-05-31/work/i6_core/returnn/training/ReturnnTrainingJob.Hh9Pv7JpsMlW/engine/i6_core.returnn.training.ReturnnTrainingJob.Hh9Pv7JpsMlW.run.7238888.1`): ```...

Albert Zeyer