你好,关于sisnr loss, 我看你在loss文件里通过calloss调用了,我有个小疑问,我们训练的时候 要调用calloss还是sisnr? 假设是调用calloss的话,我会遇到一个问题就是,我传入的参数是[batch, 1, signal_size],会经常遇到 loss_total = loss_total / (output.shape[0] - zerocount) 除以0的问题。 假设直接调用sisnr,我传入参数是[batch, signal_size],会导致 t = torch.sum(x_zm * s_zm) * s_zm / (l2norm(s_zm)**2 + eps) 除以维度不统一。
Another question is that, is there any dev and test set, which I cat get access? Or I have to split the dev from the train myself?
I tried to use the alignment tool, and I realised each audio length is about 15-25s. Is it possible to modify the script to align shorter length(