Albert Zeyer comments

Results 972 comments of


                                            Albert Zeyer

HDFDataset (or generic dataset) post processing

How would the user specify such post-processing function per dataset? It could be another argument for the dataset itself, so the user specifies it like: ```python train = { ...,...

HDFDataset (or generic dataset) post processing

The post-processing function is not per task but per dataset. At least that is what I wrote above. Or do you want to have it per task? But I guess...

HDFDataset (or generic dataset) post processing

> in the engine class you know for what the dataset is used and from which name in the config it comes from (I hope) No, you don't. E.g. we...

HDFDataset (or generic dataset) post processing

One aspect I realized now: Where exactly would this be executed? As this is now outside the dataset, `MultiProcDataset` cannot really make use of this, so it cannot be parallelized...

HDFDataset (or generic dataset) post processing

Another aspect came up (@Judyxujj): We were interested in implementing mixup in this post processing function. But this is not really possible with the current design. This additionally needs: *...

HDFDataset (or generic dataset) post processing

> That would interact favourably with multiprocessing, No, not really. Only the `DataLoader` multiproc would apply here, which is usually just a single proc. But we want to have multiple...

HDFDataset (or generic dataset) post processing

> I was under the assumption that in RETURNN+PT the data loader `num_workers` is basically a replacement for `MultiProcDataset`. I.e. in the cases where I want to use more than...

HDFDataset (or generic dataset) post processing

Btw, after some discussion yesterday with @curufinwe, I think a pragmatic simple solution for now is really to implement this as a new separate dataset, like this `PostProcessingDataset`. This directly...

HDFDataset (or generic dataset) post processing

For some other examples of similar processing datasets, see: `VariableDataset`, `MetaDataset`, `AnythingDataset`, `ConcatSeqsDataset`. Btw, in the main post, I extended a bit the list of example post processing functions. One...

HDFDataset (or generic dataset) post processing

> However, this should be implemented in a streaming way, i.e. it gets in a sequence of `TensorDict`, and should output a new sequence of `TensorDict`. The question is a...