Patrick Wilken comments

Results 24 comments of


                                            Patrick Wilken

Frontend API and PyTorch backend

I'm implementing a dataset wrapper. The code is still too ugly for a pull request, so first a comment: 😄 The general interface is clear: ``` from torch.utils.data import IterableDataset...

Frontend API and PyTorch backend

@albertz, it looks to me that having a separate loss function in the training loop is not strictly necessary. Those available loss functions output normal Tensors. So you could have...

Frontend API and PyTorch backend

The interface of RETURNN datasets is iterable style. I would first wrap that and `Dataset` instead of `IterableDataset` wouldn't really fit here. Maybe later a map-style wrapper for HDFDataset or...

Frontend API and PyTorch backend

I implemented and pushed dev set evaluation and learning rate scheduling. I just reused the existing `LearningRateControl` code, so all different scheduling options should be supported. For now I only...

Frontend API and PyTorch backend

I just noticed that another thing still missing is keeping the optimizer state. Both between epochs - currently we recreate it in every epoch - as well as saving the...

Removing redundant dataset parameters / batching vs. seq_ordering

The `batching` and by the way also `cache_size` global parameter is useful when you use a path to an HDF file (not a HDFDataset config dict) as `train`. For example...

Removing redundant dataset parameters / batching vs. seq_ordering

> No, this is not similar for windowing. Windowing changes the dimension, so it must be consistent. Ok, makes sense then to have it global. > Maybe we could even...

Removing redundant dataset parameters / batching vs. seq_ordering

> You can easily have different options for train/dev/eval. Then I don't understand what you mean. Like having `train_seq_ordering`, `eval_seq_ordering` etc. as global parameters? It's true that sequence ordering can...

Prefix decoding

Yep, #552 sounds very related and could improve my padding approach. I agree, that my implementation is not "modular". About two decoding passes: Yes, I was aware of that concept,...

Prefix decoding

Ok, let me explain here why I opened #714 rather than using the options you proposed. The two-decoder implementation makes the config unnecessarily complicated. And by config I mean the...