Piotr Żelasko comments

Results 523 comments of


                                            Piotr Żelasko

Dynamic bucket selection rng sync

Interesting. How many GPUs? Can you also try increasing the buffer size to 50k? Otherwise maybe the batch duration is too low to notice a difference. I observed a 10%...

Dynamic bucket selection rng sync

Aside from that, your max duration seems low for A100. Try adding quadratic_duration=15 to the sampler and you’ll probably be able to increase max duration by 100-200 (but I’d expect...

Dynamic bucket selection rng sync

Mmm it seems you are using webdataset or Lhotse shar, so when batch size or buffer size grows, the initialization of dataloader (on first step of iteration) takes longer as...

Dynamic bucket selection rng sync

Just pushed a version that is better tested and supports both map-style and iterable-style datasets.

Dynamic bucket selection rng sync

I've tested this change more thoroughly and I'm now confident it helps with the training speed. When training NeMo FastConformer RNNT+CTC ASR on a ~20k hours dataset with 16 GPUs...

Add the ReazonSpeech recipe

Also, could you add an entry in the dataset table in docs/corpus.rst?

Add the ReazonSpeech recipe

It seems the tests are failing on importing `num2words`, can you make it into a local import guarded by `is_module_available` (pls search lhotse sources for `is_module_available` to see an example...

Is there any possible to use multi GPU to inside the compute_and_store_features_batch function

It will be more robust if you split your manifest into parts and process each part separately. You can launch the script multiple times with GPU ID as an argument.

Create a custom audio transformation

You'd define a transform class for that and add the relevant methods to recording/cut. You can see this PR for an end-to-end example: https://github.com/lhotse-speech/lhotse/pull/382/files#diff-add451896faa625c1820580ab6ad64bef75e2886d551efc0f5705100ea62b28a These transforms are intended mostly for...

How to split manifests into several parts

There are two methods `split` and `split_lazy` defined on each manifest type https://github.com/lhotse-speech/lhotse/blob/4f014b13202c724d484e0471343053a261487b8a/lhotse/cut/set.py#L821-L882 Also accessible from CLI: https://github.com/lhotse-speech/lhotse/blob/4f014b13202c724d484e0471343053a261487b8a/lhotse/bin/modes/manipulation.py#L130-L215