returnn issues

Dataset batching like ESPnet support

ESPnet basically does it like this: - Sort the whole dataset. (The dataset could maybe be directly stored in a sorted way. This would speed up the random access later.)...

albertz

torch export to onnx extern data size configurable

This PR adds `dyn_dim_min_sizes` and `dyn_dim_max_sizes` to the command line options of `tools/torch_export_to_onnx.py `. In default, extern data with time dimension size in range [2,25] would be generated for export,...

Judyxujj

MetaDataset with sequence list filter file

I have a `MetaDataset` which contains two `HDFDatasets` and I want to apply a sequence list filter file. The `MetaDataset` has an option `seq_list_file`, but the docstring says >You only...

vieting

High memory usage with datasets (specifically when multi procs are used)

For single GPU training, without PyTorch DataLoader multiprocessing or without MultiProcDataset, the memory usage of the dataset is maybe not too much of a problem. However, it is not uncommon...

albertz

RuntimeError: CUDA error: unspecified launch failure

2

``` RETURNN starting up, version 1.20240117.113304+git.54097989, date/time 2024-01-17-23-15-11 (UTC+0000), pid 1130069, cwd /work/asr4/zeyer/setups-data/combined/20 21-05-31/work/i6_core/returnn/training/ReturnnTrainingJob.wmezXtjsvAck/work, Python /work/tools/users/zeyer/py-envs/py3.11-torch2.1/bin/python3.11 RETURNN command line options: ['/u/zeyer/setups/combined/2021-05-31/work/i6_core/returnn/training/ReturnnTrainingJob.wmezXtjsvAck/output/returnn.config'] Hostname: cn-284 Installed native_signal_handler.so. PyTorch: 2.1.0+cu121 (7bcf7da3a268b435777fe87c7794c382f444e86d) ( in...

albertz

RF backend: PyTorch code

For debugging, for dumping, but also as an alternative to `torch.compile`-support for the direct PyTorch backend (#1491), it could be useful to have another backend which outputs PyTorch code, instead...

albertz

PyTorch `torch.compile`, scripting, tracing for faster computation (specifically training)

2

For the first few steps, it could run without tracing/scripting, but then it could enable it and from then on use the Torch graph directly (very similar to TF computation...

albertz

Speedup eager execution

2

Here I want to collect some things to be done to speed up eager-mode execution. Most of it did not really matter in graph-mode execution when those extra things are...

albertz

Support `torch.compile` for RF

3

I'm not really sure whether that is possible because we have our own `Tensor` class which wraps around the `torch.Tensor`, and similarly all the PyTorch functions are wrapped inside RF....

albertz

PyTorch distributed training CPU OOM with sync_on_cpu

1

It trains fine for a while, and then often I get a CPU OOM, which looks like: ``` [2024-01-04 11:41:05,662] INFO: Start Job: Job Task: run ... RETURNN starting up,...

albertz

returnn
returnn copied to clipboard

Metadata

Dataset batching like ESPnet support

torch export to onnx extern data size configurable

MetaDataset with sequence list filter file

High memory usage with datasets (specifically when multi procs are used)

RuntimeError: CUDA error: unspecified launch failure

RF backend: PyTorch code

PyTorch `torch.compile`, scripting, tracing for faster computation (specifically training)

Speedup eager execution

Support `torch.compile` for RF

PyTorch distributed training CPU OOM with sync_on_cpu

← Metadata

Owner

Metadata

returnn returnn copied to clipboard

Metadata

← Metadata

Owner

Metadata

returnn
returnn copied to clipboard