Albert Zeyer

Results 938 comments of Albert Zeyer

Update: I think the draft is mostly ready now. Please check if this suits all possible use cases (multi GPU training, TPU training, having multiple dataset workers, etc, whatever you...

> > Yes, but you are again discussing minor implementation details. > > I wanted to start with the easier parts before I assume wrong things. > > So far...

One small remaining question: Should this new dataset pipeline (i.e. when you set `dataset_pipeline`) use [distributed TensorFlow](https://github.com/rwth-i6/returnn/wiki/Distributed-TensorFlow) by default (i.e. have one dedicated worker for the dataset, and one worker...

> I'm not sure I understand if you guys are using the word "distributed" in the same sense it's used in TF. Distributed across GPUs within a single machine, or...

Just as a note: I started implementing this. Beginning with only the bare minimum. The first goal is to get single-GPU training to work. I will soon push some first...

> Yup, that was my point. Too much to my taste to call the suckers the same word. Keeping a parameter server on a CPU in a multi-GPU host is...

The simple case (no distributed TF, no dedicated dataset loader workers, no Horovod, i.e. no multi-GPU training) should work now, at least with the default pipeline. You can just set...

I'm trying with such an implementation now for dynamic batch sizes via `bucket_by_sequence_length`: ``` def dataset_pipeline(context): """ :param InputContext context: :rtype: tensorflow.data.Dataset """ import tensorflow as tf dataset = context.get_returnn_dataset()...

Small status report: I think this is mostly done. This issue was anyway only really about the API design, and that seems good (no objections so far by anyone). The...

I wonder whether this can be slow and suboptimal in some cases. E.g. in `DotLayer`, you definitely would not want to do that, when the axis is present in one...