Piotr Żelasko

Results 523 comments of Piotr Żelasko

Right. I triggered the tests, if they're green, LGTM.

Please follow these instructions, pre-commit will auto-apply relevant changes, but you'll need to commit them yourself. https://github.com/lhotse-speech/lhotse#development-installation

You can quickly fix that with the same effect by replacing `infinite_mux(*cuts, ...)` with `mux(*[c.repeat() for c in cuts], ...)`. The issue comes from the fact that `infinite_mux` samples sources...

It looks like it's ffmpeg-torchaudio bindings related. CC @mthrok could you suggest anything?

I am not sure if it is possible to read only a single channel out of a WAVE file. I think the samples from each channel are stored next to...

> Regarding to processing time of compute_and_store_features_batch, I think that it mainly because of disk IO. So we can create special kind of SimpleCutSampler that prefetch audio into memory using...

Thanks for a very detailed comment Dan!! I will distill this into specification items for myself later. I'll leave some comments below just to keep you informed on which features...

Another question should we support older PyTorch versions for this format? WebDataset was largely ported to core PyTorch (https://github.com/pytorch/data), and some of these things like sharding filters or even tar...

This is super helpful to organize my thinking for this effort, thanks! The iterable-based mechanism you described is exactly what the PyTorch team is building with torchdata (see https://github.com/pytorch/data). I...