Albert Zeyer
Albert Zeyer
Add `scaled_dot_product_attention` as a function to RF, and use it in our attention code. (Does this also work with `RelPosSelfAttention`?) In case of PyTorch, wrap `torch.nn.functional.scaled_dot_product_attention`. That should be much...
This is for the parameter averaging method in distributed training. The SlowMo method adds an additional momentum which is used for the outer loop updates (i.e. after param averaging). *...
Currently we don't have weight dropout in the RF. We should add it. (I thought there was an issue already about it but I don't find it.) Related: * Weight...
`deepcopy` on `Tensor` will not copy the `raw_tensor`: ```python def __getstate__(self): d = {k: getattr(self, k) for k in self.__slots__} d["_raw_tensor"] = None # do not store the TF tensors...
There are multiple implementations of batch norm, but here, three different cases are relevant: 1. The pure RF implementation (which is used e.g. when `use_mask=True`) 2. RF TF-layers backend, via...
We could use such code: ```python import torch from torch.utils._pytree import tree_all from torch.utils._python_dispatch import TorchDispatchMode class NaNInfMode(TorchDispatchMode): enabled: bool def __init__(self, enabled=True): super().__init__() self.enabled = enabled @staticmethod def check_finite(pytree):...
It's likely a hardware issue. Similarly, there is also #1465. I just want to report this for future reference. Multi GPU training (but that's likely not relevant), log (`/work/asr4/zeyer/setups-data/combined/2021-05-31/work/i6_core/returnn/training/ReturnnTrainingJob.Hh9Pv7JpsMlW/engine/i6_core.returnn.training.ReturnnTrainingJob.Hh9Pv7JpsMlW.run.7238888.1`): ```...
I think some tests fail (at least locally for me). Current version is TF 2.16, but whenever someone gets to the issue here, maybe check for the latest stable TF...
I just wanted to track this here: There seem to be some WER degradation in some setup by @Marvin84 occuring in TensorFlow 2.14 and not in earlier versions (although this...
### Describe the bug I have problems to use my ASM1166-based SATA controller. It is connected via M.2 to RP PCI (but I have tried other variations with ASM1166-based SATA...