Piotr Żelasko
Piotr Żelasko
That's weird. Something went wrong when uploading. I'm pushing the missing files, you can expect them to be there in the next hour.
> @glynpu > > @pzelasko counts from 1, not from 0. So you should use epoch-{7,8,9}.pt We'll probably need to make the indexing consistent, different parts of code base count...
This is a pretty feature-rich and efficient implementation of sub-word tokenizers (with training methods too) https://github.com/huggingface/tokenizers
Looks cool! My two cents are it’s probably worth it to start with RNNLM and eventually try some autoregressive transformers like GPT2 (small/medium size).
> Also, I find the alignment information contained in the supervision is too simple Can you describe the issue more? I'm not sure I understand what's missing there. We could...
BTW I wonder if we should support piping these programs together, Kaldi-style. Click easily allows doing that with [file type arguments](https://click.palletsprojects.com/en/8.0.x/arguments/#file-arguments). We could do that by writing/reading JSONL-serialized manifests in...
... there is also some code for line-by-line [incremental JSONL writing in Lhotse](https://github.com/lhotse-speech/lhotse/blob/master/lhotse/serialization.py#L189) that could be extended to support this.
Fair enough. The idea is to allow sth like: ``` snowfall net compute-post - | snowfall net compute-ali - ``` but I just realized that with the current way things...
Agreed. But for the record, the full quote is actually: > "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all...
I'd actually suggest returning these graphs directly from the Lhotse DataLoader to have a clear separation between data preparation and the rest of the training loop. Assuming they can be...