Moritz Gunz
Moritz Gunz
What is the syntax for the comments that turn off the linter for a specific line?
@albertz Do you think this needs a test around the config processing?
@albertz Can you give another review for this PR?
We need all of them. That's why I feel bad cloning everything.
I am encountering these issues alot, and currently deal with them by automatically restarting the training job (via sisyphus, by deleting the error condition). This is suboptimal because you lose...
I do use `use_train_proc_manager`, but when it only works for single-GPU training that's probably why it doesn't have an effect for me. I am observing this mainly on A5000 and...
It is a function that sits in the returnn config and is assigned in the optimizer as `"weight_decay_custom_include_check": _weight_decay_blacklist,`. So it is probably becoming a part of the optimizer state?
The diff is still huge and contains things already in master, maybe not rebase but merge master again.
The global option `dataset_distribution=shard` would supersede the `distrib_shard_files` option from `DistributeFilesDataset`(doing exactly the same), right?