Piotr Żelasko
Piotr Żelasko
In general, it shouldn't happen in Lhotse, at least not for LibriSpeech data. Let me check with your fix and maybe we'll be able to get to the bottom of...
Yeah it seems to me that Lhotse could be off by one frame; but I think it still makes sense to adjust the transformer code to handle the scenario when...
I've got the new best result when I use bucketing + concatenation now, this is avg for the last 5 epochs: ```python 2021-03-02 09:00:17,479 INFO [mmi_att_transformer_decode.py:300] %WER 7.90% [4151 /...
These are after the recent fixes we did both in Lhotse and transformer code.
BTW it occurred to me that we might close the gap between bucketing and no bucketing with multi GPU training (once fixed), as each GPU will likely sample bucket of...
I haven’t yet — I’ll submit a PR soon.
As Dan said, there will typically be segmentation information (in Lhotse represented as `SupervisionSet`). After you create the `CutSet`, call `.trim_to_supervisions()` on it to have each cut represent a single...
Can you find and print the cut with that ID (`cut_set['218b7cc8-594f-4114-b2e7-67863db7f0ce']`)? I wonder if this is a rounding error or something bigger.
The cuts look correct to me. The „start” field has a different semantics in cut and in supervision: for cut, it is relative to the start of the recording; for...