Piotr Żelasko
Piotr Żelasko
Try this one https://github.com/lhotse-speech/lhotse/blob/master/lhotse/dataset/cut_transforms/extra_padding.py by passing it to the Dataset as a cut transform.
Hmm I suppose you can call it on the full CutSet and then call fill_supervisions. Maybe I can add it as a CutSet method later for convenience.
Sounds good!
Preliminary check: it looks like we could gain a lot by decreasing the lilcom tick size further (currently, we use -5 as default). I stored mini_librispeech `dev-clean-2` features on disk...
Yeah, this is another direction that I'm concurrently pursuing. Still there are scenarios when the I/O is costly enough (and storage cheap enough) that it's preferable to pre-compute the features,...
As to lilcom compression being slower -- I didn't measure exactly, but reading precomputed features is way faster than on-the-fly reading of audio chunks from OPUS. With HDF5 + lilcom,...
... I might have a solution in #343 but not tested on GigaSpeech yet.
Sounds interesting! I'm not sure I got the point about blocking style data reading due to metadata -- we can (fairly easily I guess) split the metadata manifests into pieces...
Cool, I understand now. You got our workflow right -- I think the only difference is that our metadata pipeline would read a jsonl file (either pre-load it entirely, or...