Piotr Żelasko comments

Results 523 comments of


                                            Piotr Żelasko

Tutorial review: Using Lhotse with PyTorch Lightning

Regarding the progress bar issues, I'm sure that such a comprehensive framework as PT Lightning has some way to support datasets that have unknown length and wouldn't force you to...

Use torchaudio to read sph files first

Did you check that it works with `shorten` encoded SPH? Some older LDC distributions have SPH that couldn't be opened with anything other than sph2pipe or shorten.

Use torchaudio to read sph files first

No, I don't think it's in the tests. You'd have to try something old like maybe Callhome, not sure but maybe also SWBD. I don't remember which corpora this happened...

Add faster-whisper (ctranslate2) as option for Whisper annotation workflow

I quickly compared the results between old and new whisper implementations on a 60s clip from AMI. In that clip, I noticed that faster-whisper tends to skip short, isolated, and...

Silero VAD for cleaning the dataset from silence

Thanks! I'll review it tomorrow, but before I do -- based on your description, it looks like a similar outcome may be achieved by running the `activity_detection` workflow and then...

Silero VAD for cleaning the dataset from silence

As a note regarding mono vs multi channel: I think it makes sense to load and process each channel separately with VAD, and assign the resulting supervision to the right...

Silero VAD for cleaning the dataset from silence

I think this workflow can be recreated with the existing operations as follows: ```python # pseudo-code workflow recordings = RecordingSet(...) # N recordings supervisions = activity_detection(recordings) # M supervisions cuts...

Silero VAD for cleaning the dataset from silence

I think I'm starting to understand what you are trying to achieve. Can you confirm the problem boils down to the following description: ``Given a cut with N supervisions modify...

Silero VAD for cleaning the dataset from silence

I'm still not sure. It looks like your example may be implemented with `.truncate()/.split()` to remove the detected non-speech segments and `.append()` to combine whatever cuts remained. The issue that...

Silero VAD for cleaning the dataset from silence

I appreciate the discussion but the design you're suggesting is too complex and not necessary. You can already achieve sequential loading of various audio chunks using cuts. If you need...