Piotr Żelasko

Results 523 comments of Piotr Żelasko

Thanks. Could you submit a PR with a fix?

Just to clarify, did you re-compute the features after applying `fix_manifests`? The issue here seems to be that somehow the cut start time does not correspond to features start time,...

I think your code looks correct. I suggest to try manually fixing cut/supervision start/duration to match those of the attached features, e.g.: ```python def adjust_timestamps(cut): cut.supervisions[0].start = cut.features.start cut.supervisions[0].duration =...

> @pzelasko Using your suggestion, I encountered this error: Are you 100% sure you got the right cut/cutset? I copy-pasted and ran it, it seems to work ok... see ```python...

> I am trying to use a bucketing sampler with a smaller dataset (~10h), and bigger batches (~30min). I want to keep number of buckets low, while enjoying balanced batches...

> Do you mean tuning to avoid OOM? Yeah. What you describe though is a good argument to bring the option back that we haven't thought about before. Would you...

I see. I think your PR makes sense to help resolve this. You may also want to use `keep_excessive_supervisions=False` for such cases.

You can do that by composing some groupby's and append, e.g. ```python from lhotse.cut.set import CutSet, append_cuts cuts = CutSet.from_file(...) speakers = cuts.speakers long_cuts_per_speaker = {s: append_cuts(cuts.filter(lambda c: c.supervisions[0].speaker ==...

Yes, you can also search for CutConcatenate transform that now has support for max_duration argument.