Samuele Cornell
Samuele Cornell
@lminer did you use voxceleb ?
I have observed the same actually. Also according to https://arxiv.org/abs/2202.00733 the use of speaker ID info does in fact nor really help.
I recall that we decided to offer some generic (and multi-purpose e.g. ConvTasNet, DPRNN) architectures into the toolkit but more specific ones just into the egs (e.g. WHAMR stacked Bi-LSTM...
It could be interesting to try, even if simulated RIRs are usually much more realistic than DSP-based artificial reverbs (at least open-source ones, commercial ones are another story). I think...
> But the first option is definitely possible and a preload_wavs flag or something like that could completely do the job. also maybe caching as they are read.
What do you mean by temporal masking ? Can something like this be useful (basically dropout over only the last dimension ) ? ```python class StepDrop(nn.Module): def __init__(self, p=0.5): self.p...
What is the difference wrt just resampling ? It should be pretty much the same no ?
for multi-frame MWF as defined in `https://arxiv.org/abs/1911.07953` I think there won't be a problem as basically the other frames are treated as other microphone channels. I think it should be...
I've encountered the same problem actually. I've issued a pull request which seems to have fixed that. The problem is due to the fact that in Python 3 the range...
@AsuMagic observations are correct IMO. Transformer code has lots of repetitions right now unfortunately and probably is better to refactor to reuse as much as possible stuff. Otherwise such stuff...