Piotr Żelasko

Results 523 comments of Piotr Żelasko

You can do sth like that today, add `num_workers=N` argument to `OnTheFlyFeatures`, it will spawn a thread pool to do the reads. IIRC it gave me some speedup in the...

You might want to experiment with the num_workers value for OnTheFlyFeatures, just bear in mind that the total number of threads spawned is num_workers (DataLoader) * num_workers (OnTheFlyFeatures), so don't...

Also LMK if it helps, it will be a useful data point for me..

If you’re using speed perturbation you might want to re try as Ive just merged an optimization. Also how about 4-8 dataloader workers and 2 on the fly workers?

I don't have a good idea why the process would die when spawning too many threads. I never ran into this issue myself. Maybe some native dependency of ours is...

... one thing that caught my attention though -- if you're working with only OPUS data, the program should never go to `torchaudio_load`, it should go directly to `read_opus_ffmpeg`. Can...

> ... but the very strange thing, to me is, how can the other workers be making progress at this time? I think I have figured it out. The other...

I think I know, it's reading MUSAN wav files to mix them with GigaSpeech. If one of cuts doesn't have precomputed features, the mixing is done in audio domain.

I was also contemplating that and it makes sense to me. For corpora that have the "real" context available (like SWBD or AMI) we can build the cuts to use...

Thanks for raising this issue. We should port `fault_tolerant` argument to `Cut.Setcompute_and_store_features` to handle these things properly. I'll try to do it but not sure when I'll find the time,...