Piotr Żelasko comments

Results 534 comments of


                                            Piotr Żelasko

Clarify comment?

Uhhh sorry this doc is too vague. I meant that using `executor_type=ProcessPoolExecutor` is incompatible with `DataLoader(num_workers > 0)`, because DataLoader's worker sub-processes cannot spawn their own sub-sub-process pools (I think...

Clarify comment?

Thread pools are supposed to work just fine (but there's an upper bound on how much speed you'll be able to gain with them)

Memory Hungry

It's possible to leverage lazy cuts in Lhotse to reduce the memory overhead. You can use the following: - creating cuts from recordings and supervisions: [`lhotse.cut.create_cut_set_lazy`](https://github.com/lhotse-speech/lhotse/blob/f1b66b8a8db2ea93e87dcb9db3991f6dd473b89d/lhotse/cut.py#L5063) (make sure to read...

Memory Hungry

That’d be the most likely explanation. There’s a num jobs argument that can help speed this up https://github.com/lhotse-speech/lhotse/blob/f1b66b8a8db2ea93e87dcb9db3991f6dd473b89d/lhotse/kaldi.py#L60

Memory Hungry

> BTW - I don't think it would help to increase the jobs for that scenario, because the speed will be disk-io limited. If Kaldi's `utils/data/get_reco2dur.sh` or `steps/make_mfcc.sh` gets faster...

Memory Hungry

> what it's taking much more time w.r.t. to Kaldi is then the feature extraction after the import; I guess it's re-opening the same file each time the feature is...

Memory Hungry

Interesting project! I’m open to replacing it but might be tough for me to find the time right now. If you guys need it please make a PR.

Memory Hungry

Is this issue resolved, or is there anything else we can do?

Memory Hungry

You can pull latest master, install Fangjun’s native kaldiio library, set a higher number of jobs for the import script, and let us know if it’s better then.

Memory Hungry

Icefall's LibriSpeech recipe uses Lhotse in a way that is suitable for small and medium sized datasets -- it reads the whole manifest into memory and does various operations on...