returnn icon indicating copy to clipboard operation
returnn copied to clipboard

The RWTH extensible training framework for universal recurrent neural networks

Results 204 returnn issues
Sort by recently updated
recently updated
newest added

(As initially discussed in #1120.) How to handle pretraining? The current suggested APIs (`get_model` and co) might needs to be changed, because we do not want to call `get_model` every...

This issue is to track any aspects and issues on PyTorch (#1120) ONNX export. * [x] Working script for conversion (`export_to_onnx.py`) * [x] Fix issue with convolution * [x] Rename...

PyTorch

Hi Is there a way to convert pretrained `returnn` networks to `ONNX` or at least save the network to `tensorflow's saved model` format? Best Musharraf

TensorFlow

`autograd.detect_anomaly` detects inf/nan in the backward pass. I want to have the same in the forward pass. With the possibility to whitelist a few special operations, modules or code blocks,...

We should collect some statistics (maybe optionally, configurable) (maybe only every N steps if too costly otherwise). Of: * weights * activations * gradients of weights * gradients of activations...

``` RETURNN starting up, version 1.20231230.164342+git.f353135e, date/time 2023-12-31-13-21-05 (UTC+0000), pid 2003528, cwd /work/asr4/zeyer/setups-data/comb ined/2021-05-31/work/i6_core/returnn/training/ReturnnTrainingJob.lmbYlKeoU6kT/work, Python /work/tools/users/zeyer/py-envs/py3.11-torch2.1/bin/python3.11 RETURNN command line options: ['/u/zeyer/setups/combined/2021-05-31/work/i6_core/returnn/training/ReturnnTrainingJob.lmbYlKeoU6kT/output/returnn.config'] ... Torch: Hostname cn-236, pid 2003531, using GPU 3....

``` [2023-12-31 11:33:54,580] INFO: Start Job: Job Task: run ... RETURNN starting up, version 1.20231230.164342+git.f353135e, date/time 2023-12-31-11-34-07 (UTC+0000), pid 1868636, cwd /work/asr4/zeyer/setups-data/combined/2021-05-31/work/i6_core/returnn/training/ReturnnTrainingJob.EImqFihsdh2B/work, Python /work/tools/users/zeyer/py-envs/py3.11-torch2.1/bin/python3.11RETURNN command line options: ['/u/zeyer/setups/combined/2021-05-31/work/i6_core/returnn/training/ReturnnTrainingJob.EImqFihsdh2B/output/returnn.config'] Hostname: cn-237...

From log (`/work/asr4/zeyer/setups-data/combined/2021-05-31/work/i6_core/returnn/training/ReturnnTrainingJob.XPpeLPG9camH/log.run.1`), filtered the CUDA mem usage reports: ``` Memory usage (cuda): alloc cur 427.8MB alloc peak 427.8MB reserved cur 446.0MB reserved peak 446.0MB Memory usage (cuda): alloc cur...

We had one particular problem when converting a Conformer acoustic model from TensorFlow to ONNX: Calculating the sequence lengths after convolution resulted in wrong calculations on ONNX side. The issue...

I noticed that the `DistributedDataParallel` module has the option `mixed_precision` which is for mixed precision training. We don't use that, even if the user specifies `torch_amp` to use mixed precision....

MultiGPU