Patrick Wilken

Results 24 comments of Patrick Wilken

I'm implementing a dataset wrapper. The code is still too ugly for a pull request, so first a comment: 😄 The general interface is clear: ``` from torch.utils.data import IterableDataset...

@albertz, it looks to me that having a separate loss function in the training loop is not strictly necessary. Those available loss functions output normal Tensors. So you could have...

The interface of RETURNN datasets is iterable style. I would first wrap that and `Dataset` instead of `IterableDataset` wouldn't really fit here. Maybe later a map-style wrapper for HDFDataset or...

I implemented and pushed dev set evaluation and learning rate scheduling. I just reused the existing `LearningRateControl` code, so all different scheduling options should be supported. For now I only...

I just noticed that another thing still missing is keeping the optimizer state. Both between epochs - currently we recreate it in every epoch - as well as saving the...

The `batching` and by the way also `cache_size` global parameter is useful when you use a path to an HDF file (not a HDFDataset config dict) as `train`. For example...

> No, this is not similar for windowing. Windowing changes the dimension, so it must be consistent. Ok, makes sense then to have it global. > Maybe we could even...

> You can easily have different options for train/dev/eval. Then I don't understand what you mean. Like having `train_seq_ordering`, `eval_seq_ordering` etc. as global parameters? It's true that sequence ordering can...

Yep, #552 sounds very related and could improve my padding approach. I agree, that my implementation is not "modular". About two decoding passes: Yes, I was aware of that concept,...

Ok, let me explain here why I opened #714 rather than using the options you proposed. The two-decoder implementation makes the config unnecessarily complicated. And by config I mean the...