returnn icon indicating copy to clipboard operation
returnn copied to clipboard

The RWTH extensible training framework for universal recurrent neural networks

Results 254 returnn issues
Sort by recently updated
recently updated
newest added

``` ... ep 42 train, step 2112, ctc_4 1.526, ctc_8 1.055, ctc 0.874, consistency 0.461, aed_ce 0.307, aed_fer 0.050, grad_norm:p2 2.938, num_seqs 45, max_size:time 263912, max_size:out-spatial 105, mem_usage:cuda 64.7GB, 0.676...

I got this two times now, at the end of successful training: ``` ... Uname: uname_result(system='Linux', node='w23g0002.hpc.itc.rwth-aachen.de', release='4.18.0-553.22.1.el8_10.x86_64', version='#1 SMP Wed Sep 25 09:20:43 UTC 2024', machine='x86_64') Load: (0.17, 0.24,...

Currently, in multiple places (where exactly?) we have the assumption, for some tensor `x: Tensor` that `max(x.dims[i].dyn_size_ext.raw_tensor) == x.raw_tensor.shape[i]`. We want to support the case where `max(x.dims[i].dyn_size_ext.raw_tensor) < x.raw_tensor.shape[i]`. Specifically,...

TPU
JAX

When you enable `calculate_exp_loss`, it will not calculate the exp loss for all the losses, but currently only those where `as_error=False`. This decision was somewhat arbitrary. The exp loss often...

#1621 describes an issue where there would be a Gloo timeout in the worker processes when the master process takes longer than 30min for the eval step. This was fixed...

bug

I just want to raise this here. Also including the README. A lot of parts are still TF specific. And many of those even don't mention that. PyTorch relevant documentation...

Again a crash. It ultimately failed with this (after retrying a few times): ``` OSError: [Errno 28] No space left on device ``` Log: ``` FileCache: Copy file /rwthfs/rz/cluster/home/az668407/setups/2025-08-aed-large/work/i6_core/datasets/huggingface/TransformAndMapHuggingFaceDatasetJob.F xPUVJtw1EeN/output/dataset/train/data-00405-of-00848.arrow...

CI run log [tf-tests (3.8, 2.10.0, TEST=TFNetworkLayer)](https://github.com/rwth-i6/returnn/actions/runs/18276719792/job/52030410528?pr=1774#logs). ``` Python env: python is /opt/hostedtoolcache/Python/3.8.18/x64/bin/python Python 3.8.18 NumPy: 1.24.4 TensorFlow: v2.10.0-rc3-6-g359c3cdfc5f 2.10.0 /home/runner/.local/lib/python3.8/site-packages/tensorflow/__init__.py ``` Relevant log: ``` ___________________________ test_ConvLayer_empty_out ___________________________ Traceback (most...

@Icemole Reports a case where he uses a PostprocessingDataset inside a MultiProcDataset. He finds that each MultiProcWorker uses more than one thread for its computation, resulting in a CPU overcommit...