Speech-Separation icon indicating copy to clipboard operation
Speech-Separation copied to clipboard

Reshaping issue

Open JuanFMontesinos opened this issue 3 years ago • 9 comments

Hi, When you do this op imagen You are commiting a mistake. You are doing input.view(B, -1, 298, 1) and that is not correct. In pytorch the reshaping op is ordered from right to left. You have to do a permutation permute(0,2,1,3) and then the reshaping view(B,298,-1) Basically this way the values [0,0,0,0:257]-->[0,0,0:257] [0,0,0,1:257]-->[0,0,257:257*2] and so on. Whay you are doing is mixing the data You are putting (B,8,298,257) [0,0,0,0:257] --> [0,0,0:257] [0,0,1,0:257] --> [0,0,257:298], then [0,1,0:298-257] and filling the reshaped tensor in a bad way

JuanFMontesinos avatar Feb 11 '21 17:02 JuanFMontesinos

A possible fix: In line #187 of models.py, replace output_layer = output_layer.view(batch_size,-1,height,1) with output_layer = output_layer.permute(0, 2, 1, 3).contiguous().view(batch_size, height, -1, 1).permute(0, 2, 1, 3)

vskadandale avatar Feb 11 '21 18:02 vskadandale

Hi,

Thanks for pointing out the mistake. It was fixed when the repo was merged to asteroid here, somehow it was not fixed here. But I did not see any significant improvement in terms of overfitting unfortunately.

vitrioil avatar Feb 12 '21 12:02 vitrioil

Thanks for the quick response @vitrioil ! It seems another similar bug is repeated here and also in the asteroid. Here, we are reshaping (N,298,2x257xself.num_person) into (N,2,298,257,self.num_person). Instead of view(N, 2, 298, 257, self.num_person), it will be more appropriate to do view(N,298,2,257,self.num_person).transpose(1,2).

vskadandale avatar Feb 12 '21 15:02 vskadandale

I see... we are mixing the axis at the last layer. Thank you again for pointing this out! I will fix this.

vitrioil avatar Feb 12 '21 15:02 vitrioil

I have raised a bug in asteroid as well. Another thing, this line needs to be commented out, I guess.

vskadandale avatar Feb 12 '21 15:02 vskadandale

please let know if you manage to overcome the overfitting problem with this fix. thanks!

vskadandale avatar Feb 12 '21 18:02 vskadandale

Here input channel expected by self.conv1 is 512 so transposing the channels so that we get the correct expected channel, otherwise channels will still be 1. I don't have access to any GPU now but if I do train it and see an improvement I will let you know.

vitrioil avatar Feb 13 '21 09:02 vitrioil

Hi, Firsly, thanks for the code. Is there any improvement in the overfitting problem? @vitrioil

MordehayM avatar May 06 '21 21:05 MordehayM

Hi @MordehayM, I still haven't tried re-training, so currently the result is the same

vitrioil avatar May 14 '21 06:05 vitrioil