Piotr Żelasko
Piotr Żelasko
You seem to have a very outdated example, I see now that I missed a few places to update in the docs. Samplers don't support len() because of dynamic batch...
Something might have changed on CommonVoice side. If the issue persists, it may be best to download directly from their site.
Unfortunately, yes. Restoring state of the sampler is unfortunately quite tricky to do quickly, and I don’t recommend using this technique with large data. Instead, it’s easier to discard the...
No, CPU RAM usage should be bounded by buffer_size setting in the sampler.
Are you using HDF5 files? We have a workaround fix in ASR dataset class but IIRC it only slows down the memory leak. You can try to use Lhotse Shar...
`DynamicBucketingSampler(..., seed=)`
Yes, remove it from the code if you rely on seed changes.
Hmm this doesn’t look to be directly related to lhotse. You’re getting the issue inside of encodec. I suggest checking the inputs in pad1d for the problematic example and try...
Should be as simple as ‘cuts.trim_to_supervisions().compute_and_store_features()’
Thank you for your help in fixing this. Will merge the fix as soon as the PR is ready, parsing the rows into dicts and referring to them by column...