Piotr Żelasko

Results 523 comments of Piotr Żelasko

That's weird. Something went wrong when uploading. I'm pushing the missing files, you can expect them to be there in the next hour.

> @glynpu > > @pzelasko counts from 1, not from 0. So you should use epoch-{7,8,9}.pt We'll probably need to make the indexing consistent, different parts of code base count...

This is a pretty feature-rich and efficient implementation of sub-word tokenizers (with training methods too) https://github.com/huggingface/tokenizers

Looks cool! My two cents are it’s probably worth it to start with RNNLM and eventually try some autoregressive transformers like GPT2 (small/medium size).

> Also, I find the alignment information contained in the supervision is too simple Can you describe the issue more? I'm not sure I understand what's missing there. We could...

BTW I wonder if we should support piping these programs together, Kaldi-style. Click easily allows doing that with [file type arguments](https://click.palletsprojects.com/en/8.0.x/arguments/#file-arguments). We could do that by writing/reading JSONL-serialized manifests in...

... there is also some code for line-by-line [incremental JSONL writing in Lhotse](https://github.com/lhotse-speech/lhotse/blob/master/lhotse/serialization.py#L189) that could be extended to support this.

Fair enough. The idea is to allow sth like: ``` snowfall net compute-post - | snowfall net compute-ali - ``` but I just realized that with the current way things...

Agreed. But for the record, the full quote is actually: > "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all...

I'd actually suggest returning these graphs directly from the Lhotse DataLoader to have a clear separation between data preparation and the rest of the training loop. Assuming they can be...