Albert Zeyer comments

Results 972 comments of


                                            Albert Zeyer

HDFDataset (or generic dataset) post processing

The `PostprocessingDataset` from #1596 is merged now. It should allow to do all of the examples discussed from here. Except that it should not have state, but that's not really...

PyTorch automatic inf/nan detection, collecting statistics

Also see #1487 about collecting statistics in general.

PyTorch distributed training CPU OOM with sync_on_cpu

I reported that upstream here: https://github.com/pytorch/pytorch/issues/116923

Export to ONNX

Yes. @Gerstenberger has some experience doing this. See for example https://github.com/rwth-i6/returnn/issues/1236#issuecomment-1339201812: > yes, i use `compile_tf_graph.py` [from RETURNN `tools/`] for this and then call the tf2onnx tool on the resulting...

Export to ONNX

Thanks for the detailed write-up. Maybe you can add this to our documentation or RETURNN wiki? > Python Script - Expand Me > > ... > > What we did...

Export to ONNX

> It would be easier to actually integrate it into the `i6_core` repository at some point together with the `tensorflow-to-onnx` job we use so we don't have to manage the...

PyTorch `torch.compile`, scripting, tracing for faster computation (specifically training)

Some links: https://pytorch.org/docs/stable/generated/torch.jit.trace.html https://pytorch.org/docs/stable/generated/torch.jit.ignore.html#torch.jit.ignore https://pytorch.org/docs/stable/jit.html#mixing-tracing-and-scripting https://github.com/rwth-i6/returnn/wiki/PyTorch-optimizations

PyTorch `torch.compile`, scripting, tracing for faster computation (specifically training)

Just now, I found about `torch.export`, which references `functorch.experimental.control_flow` for control flow: https://pytorch.org/tutorials/intermediate/torch_export_tutorial.html https://pytorch.org/executorch/stable/tutorials/export-to-executorch-tutorial.html

Speedup eager execution

After all the recent optimizations, now looking at the profiling (`demo-rf-pt-benchmark.py`, using Py-Spy with `--native` with GPU), I don't see any obvious low-level bottleneck anymore. Sorted by total time: Sorted...

Speedup eager execution

Now for Librispeech, one subepoch (1/20 epoch) takes 16:40min on Nvidia 1080 GPU with a Conformer ([setup/config](https://github.com/rwth-i6/i6_experiments/blob/main/users/zeyer/experiments/exp2023_04_25_rf/conformer_import_moh_att_2023_06_30.py)), which is about the same as we see with TensorFlow for exactly the...