Piotr Żelasko comments

Results 523 comments of


                                            Piotr Żelasko

how much shared memory and disk memory do i need to process the S subset of wenetspeech dataset?

Looks like not every training example has features extracted. Make sure you passed the path to the right cut set (with features). You can also check ‘lhotse cut describe ’...

how much shared memory and disk memory do i need to process the S subset of wenetspeech dataset?

> looks like features num is much smaller than cuts count? is that something wrong?and why it happend? I combine two sets to get the cut_train set and I found...

how much shared memory and disk memory do i need to process the S subset of wenetspeech dataset?

You either need to use keep_overlapping=False or filter out the cuts that have overlapping speech (whichever makes sense for your use case).

how much shared memory and disk memory do i need to process the S subset of wenetspeech dataset?

Some tips: - splitting cut/recording/supervision set into smaller parts can be done with `parts = cuts.split(num_parts)`, e.g.: ``` In [4]: cuts Out[4]: CutSet(len=1519) [underlying data type: ] In [8]: cuts.split(2)...

how much shared memory and disk memory do i need to process the S subset of wenetspeech dataset?

I didn’t get your question, please elaborate.

how much shared memory and disk memory do i need to process the S subset of wenetspeech dataset?

Yes, you can compute the features inside the PyTorch dataset class. See OnTheFlyFeatures or K2SpeechRecognitionDataset for some examples. You can also look up k2-fsa/icefall repo for recipes that support this.

[WIP] spokewoz recipe

In addition to what Desh stated: `plot/play_audio` does not actually support multi-channel data (yet). The reason it was slow for you is because matplotlib plot received a 2-d array `(num_channels,...

Can't save a cutset with features in memory

It looks like you might have some data loaded in memory. Can you share more about the context of your usage? Are you using webdataset/shar or functions such as move_to_memory?...

Can't save a cutset with features in memory

I think the issue might be that you also have audio data that is in memory and can't be stored in JSONL. You can either do `cuts = cuts.drop_recordings()` before...

Can't save a cutset with features in memory

In the example posted above copy_feats had worked so I can't really replicate your case. The only other thing I can think of right now is that you might have...