Piotr Żelasko

Results 523 comments of Piotr Żelasko

To fix the len issue you need to set trainer.max_steps to how long you want to train and limit_train_batches to some value (eg 1k steps)

Thanks for letting me know. This will help narrow it down. Once I figure out what’s caused the issue I’ll let you know here.

> You suggested that I should shard my dataset. Is this generally advisable to shard datasets for any training set-up, or specifically important because of the concurrent_bucketing setting being set...

> Do you know whether disabling concurrent_bucketing as you suggested above would have caused the below error? It happens approx. every 10000 steps (1 pseudo-epoch), so I have to restart...

There's not a lot of detail but if I had to guess, this could be CPU OOM. You can verify by monitoring it with some tool like htop or nmon....

Very surprising. Anyhow, glad you figured it out.

Is dynamic bucketing sampler used for validation set as well in the deadlocked run?

Could you try with https://github.com/lhotse-speech/lhotse/pull/1355?

Makes sense... could you make a PR with the fix? Also, can you run `lhotse.validate()` on the input cut and see if it finds anything wrong with it?