returnn
returnn copied to clipboard
The RWTH extensible training framework for universal recurrent neural networks
Training was running fine for 29 subeochs but then crashed with CPU OOM. While I sometimes see CPU OOMs in my setup, this is usually after longer trainings. So it...
``` ... ep 1 train, step 294, ctc_4 4.553, ctc_8 4.531, ctc 4.510, num_seqs 11, max_size:time 201384, max_size:out-spatial 149, mem_usage:cuda:0 5.9GB, 0.411 sec/step ep 1 train, step 294, ctc_4 4.516,...
Closes #1575 This probably didn't need a PR but I'm unsure if there was a good reason for v4 verbosity or not. Feel free to immediately merge if there wasn't.
I find log verbosity 3 a reasonable verbosity for "daily" work/trainings, but I'm missing the model structure in the log. This is because it's only printed at v4 level. Is...
Yesterday I started a training with DistributeFilesDataset and file caching which today crashed and consistently crashes after restarting with what I think is `OSError: AF_UNIX path too long` in the...
In my current language model training I sometimes get "nan" gradients, which break the training. Surprisingly, just restarting the training from the last checkpoint is often enough uncertainty to resume...
For large datasets a blocklist can be shorter/more compact than an allowlist in some cases, e.g. if you want to exclude 1k segments out of 1M.
I had this bug: ```python log_prob = ... # [B,T+1,D] targets = ... # [B,T] -> D loss = rf.cross_entropy(target=targets, estimated=log_prob, ...) loss.mark_as_loss(...) ``` What you get here is *no*...
In some Python file, we still have this, which basically survived for many years: ``` __author__ = "Patrick Doetsch" __copyright__ = "Copyright 2015" __credits__ = ["Patrick Doetsch", "Paul Voigtlaender"] __license__...