Parcollet Titouan
Parcollet Titouan
It's not slow ... it is slow AS FUCK. But I believe that this is expected from python only RNN6T decoders.
Hello @egaznep I am not sure to understand the issue here. Could you provide a code snippet showing explicitly the error? The function length_to_mask() is expected to provide masks containing...
Hello thanks. SpeechBrain padding is relative to the batch, not the dataset. The max len of wav_lens is the max len of the batch.
@Gastron correct me if I am wrong, but as far as I know, DDP sampler is per-process, hence the padding should be relative to the batch of each process. @egaznep...
Hi, it's important that the total batch size corresponds to roughly 1.6h. By changing the gradient accumulation factor your can adjust this.
@Adel-Moumen i see that the gradient accumulation factor is missing on this recipe. Could you add it? (No need to PR imho push directly to develop). @GasserElbanna have a look...
fp16 or bf16 would make the training much faster if you have a compatible GPU.
It does allow for stop and restart because you are altering the object i.e. the checkpointer keeps track of it! The only problem is indeed that you store the whole...
You mean depend on another Huggingface library?
If you could give me a neat example of an integration of PEFT, I could be convinced.