Moritz Gunz

Results 133 comments of Moritz Gunz

1. I agree this should be deterministic from run to run as long as the parameters stay the same. 2. I think even `map_seq_stream` is technically not a problem, we...

Another data point: I have a setup where I use MultiProcDataset + DistributeFilesDataset around postprocessing datasets postprocessing data from HDFDataset. Since DFD prefetches one subepoch of data, this ends up...

We just had another case at AppTek where the memory consumption of the workers became a bottleneck in combination w/ DistributeFilesDataset. I think this is due an implementation in the...

Wrt. implementation, I'm currently thinking about the following: 1. Spawn a number of worker processes. Each worker process gets a (separate) connection to the main proc, and a (separate) Q...

> Ah sorry, this is for feeding the workers, which is different in MultiProcDataset, where they all use their own sub dataset. Yes! The point here is to avoid duplicating...

> How exactly? This is basically https://github.com/rwth-i6/returnn/issues/1762. We don't really have a solution for that yet. In this PR (https://github.com/rwth-i6/returnn/pull/1765) I've used (conceptually) `rng_seed_for_worker=self.get_random_seed_for_epoch(epoch=epoch * num_workers + worker_idx)`. I think...

There is a gradient checkpointing API in PT: https://pytorch.org/docs/stable/checkpoint.html It even saves/restores the RNG state so we could do Dropout in there. I'm not sure the RNG state there can...

It seems to me the API PT exposes for gradient checkpointing could be used as the RF frontend API and for the associated TF-backed implementation as well?

> Yea that is what I referred to when we talked about it. But I need to check it more how it is done there. Specifically, I'm still not exactly...

> I would also assume, only the main thread is also mostly active, and the other threads are more idle. E.g. if this is some thread by Numpy or so,...