returnn icon indicating copy to clipboard operation
returnn copied to clipboard

The RWTH extensible training framework for universal recurrent neural networks

Results 204 returnn issues
Sort by recently updated
recently updated
newest added

Training was running fine for 29 subeochs but then crashed with CPU OOM. While I sometimes see CPU OOMs in my setup, this is usually after longer trainings. So it...

``` ... ep 1 train, step 294, ctc_4 4.553, ctc_8 4.531, ctc 4.510, num_seqs 11, max_size:time 201384, max_size:out-spatial 149, mem_usage:cuda:0 5.9GB, 0.411 sec/step ep 1 train, step 294, ctc_4 4.516,...

Closes #1575 This probably didn't need a PR but I'm unsure if there was a good reason for v4 verbosity or not. Feel free to immediately merge if there wasn't.

I find log verbosity 3 a reasonable verbosity for "daily" work/trainings, but I'm missing the model structure in the log. This is because it's only printed at v4 level. Is...

Yesterday I started a training with DistributeFilesDataset and file caching which today crashed and consistently crashes after restarting with what I think is `OSError: AF_UNIX path too long` in the...

In my current language model training I sometimes get "nan" gradients, which break the training. Surprisingly, just restarting the training from the last checkpoint is often enough uncertainty to resume...

For large datasets a blocklist can be shorter/more compact than an allowlist in some cases, e.g. if you want to exclude 1k segments out of 1M.

I had this bug: ```python log_prob = ... # [B,T+1,D] targets = ... # [B,T] -> D loss = rf.cross_entropy(target=targets, estimated=log_prob, ...) loss.mark_as_loss(...) ``` What you get here is *no*...

In some Python file, we still have this, which basically survived for many years: ``` __author__ = "Patrick Doetsch" __copyright__ = "Copyright 2015" __credits__ = ["Patrick Doetsch", "Paul Voigtlaender"] __license__...