returnn
returnn copied to clipboard
The RWTH extensible training framework for universal recurrent neural networks
(As initially discussed in #1120.) How to handle pretraining? The current suggested APIs (`get_model` and co) might needs to be changed, because we do not want to call `get_model` every...
This issue is to track any aspects and issues on PyTorch (#1120) ONNX export. * [x] Working script for conversion (`export_to_onnx.py`) * [x] Fix issue with convolution * [x] Rename...
Hi Is there a way to convert pretrained `returnn` networks to `ONNX` or at least save the network to `tensorflow's saved model` format? Best Musharraf
`autograd.detect_anomaly` detects inf/nan in the backward pass. I want to have the same in the forward pass. With the possibility to whitelist a few special operations, modules or code blocks,...
We should collect some statistics (maybe optionally, configurable) (maybe only every N steps if too costly otherwise). Of: * weights * activations * gradients of weights * gradients of activations...
``` RETURNN starting up, version 1.20231230.164342+git.f353135e, date/time 2023-12-31-13-21-05 (UTC+0000), pid 2003528, cwd /work/asr4/zeyer/setups-data/comb ined/2021-05-31/work/i6_core/returnn/training/ReturnnTrainingJob.lmbYlKeoU6kT/work, Python /work/tools/users/zeyer/py-envs/py3.11-torch2.1/bin/python3.11 RETURNN command line options: ['/u/zeyer/setups/combined/2021-05-31/work/i6_core/returnn/training/ReturnnTrainingJob.lmbYlKeoU6kT/output/returnn.config'] ... Torch: Hostname cn-236, pid 2003531, using GPU 3....
``` [2023-12-31 11:33:54,580] INFO: Start Job: Job Task: run ... RETURNN starting up, version 1.20231230.164342+git.f353135e, date/time 2023-12-31-11-34-07 (UTC+0000), pid 1868636, cwd /work/asr4/zeyer/setups-data/combined/2021-05-31/work/i6_core/returnn/training/ReturnnTrainingJob.EImqFihsdh2B/work, Python /work/tools/users/zeyer/py-envs/py3.11-torch2.1/bin/python3.11RETURNN command line options: ['/u/zeyer/setups/combined/2021-05-31/work/i6_core/returnn/training/ReturnnTrainingJob.EImqFihsdh2B/output/returnn.config'] Hostname: cn-237...
From log (`/work/asr4/zeyer/setups-data/combined/2021-05-31/work/i6_core/returnn/training/ReturnnTrainingJob.XPpeLPG9camH/log.run.1`), filtered the CUDA mem usage reports: ``` Memory usage (cuda): alloc cur 427.8MB alloc peak 427.8MB reserved cur 446.0MB reserved peak 446.0MB Memory usage (cuda): alloc cur...
We had one particular problem when converting a Conformer acoustic model from TensorFlow to ONNX: Calculating the sequence lengths after convolution resulted in wrong calculations on ONNX side. The issue...
I noticed that the `DistributedDataParallel` module has the option `mixed_precision` which is for mixed precision training. We don't use that, even if the user specifies `torch_amp` to use mixed precision....