Piotr Żelasko comments

Results 523 comments of


                                            Piotr Żelasko

#1203 Update `compute_supervisions_frame_mask()` to skip negative end indices

Right. I triggered the tests, if they're green, LGTM.

#1203 Update `compute_supervisions_frame_mask()` to skip negative end indices

Please follow these instructions, pre-commit will auto-apply relevant changes, but you'll need to commit them yourself. https://github.com/lhotse-speech/lhotse#development-installation

Duplicate Manifest ID / Mux

You can quickly fix that with the same effect by replacing `infinite_mux(*cuts, ...)` with `mux(*[c.repeat() for c in cuts], ...)`. The issue comes from the fact that `infinite_mux` samples sources...

LibsndfileCompatibleAudioInfo leads to %CPU more than 100%

It looks like it's ffmpeg-torchaudio bindings related. CC @mthrok could you suggest anything?

The Feature load method does not support that 'channel_id'

@desh2608 can you check this one out?

Slow loading of MonoCut created from multi-channel Recording

I am not sure if it is possible to read only a single channel out of a WAVE file. I think the samples from each channel are stored next to...

Slow loading of MonoCut created from multi-channel Recording

> Regarding to processing time of compute_and_store_features_batch, I think that it mainly because of disk IO. So we can create special kind of SimpleCutSampler that prefetch audio into memory using...

Draft specification of a new, extensible sequential storage format

Thanks for a very detailed comment Dan!! I will distill this into specification items for myself later. I'll leave some comments below just to keep you informed on which features...

Draft specification of a new, extensible sequential storage format

Another question should we support older PyTorch versions for this format? WebDataset was largely ported to core PyTorch (https://github.com/pytorch/data), and some of these things like sharding filters or even tar...

Draft specification of a new, extensible sequential storage format

This is super helpful to organize my thinking for this effort, thanks! The iterable-based mechanism you described is exactly what the PyTorch team is building with torchdata (see https://github.com/pytorch/data). I...