speechbrain
speechbrain copied to clipboard
SpecAugment should not mask in padding
When given a batch, the current SpecAugment implementation does not take the different lengths of the samples into account. It can happen that the time-mask is applied to padded areas. The positions of the time masks should be sampled for each batch element depending on the length of the sample.
Good catch. Here, I think we should also look at the implementation run time, because occasionally sampling the mask into the padded area is probably not devastating. But I think this can be implemented precisely and fast at the same time.
We have to refactor the augmentation part in a major version of SpeechBrain. We also have to do other changes such as: 1- return the lengths when using speed change 2- resample noise and rir when the sampling frequency is different than 16000. 3- We can create an augmentation pipeline function that allows us to dynamically specify the sequence of transformations to perform in the yaml file directly. I already have a preliminary version of that here: https://github.com/speechbrain/speechbrain/pull/975/files#diff-a94c1e4a684b9f25a02c454f630a641676074b878efa0d9331fe17806baf1bd2 This requires an interface change and we can thus do these modifications when releasing SpeechBrain 0.6 (i.e, the next major release).
On Fri, 12 Nov 2021 at 03:36, Aku Rouhe @.***> wrote:
Good catch. Here, I think we should also look at the implementation run time, because occasionally sampling the mask into the padded area is probably not devastating. But I think this can be implemented precisely and fast at the same time.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/speechbrain/speechbrain/issues/1116#issuecomment-966920090, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEA2ZVUDRJQQK7KURA4NSZTULTGZTANCNFSM5H22YADA .
A related thing I noticed with the current SpecAugment Implementation and padding: If "replace_with_zero" is set to false, it will replace with the mean of the whole batch including the padded areas. It is no problem if you normalize the features before augmentation and then replace with zero (which I think will be done most of the time), but if the augmentation is refactored anyway this should maybe be changed too.
Hello,
Any news regarding this issue please?
Thanks.