Speech-Separation
Speech-Separation copied to clipboard
Reshaping issue
Hi,
When you do this op
You are commiting a mistake.
You are doing
input.view(B, -1, 298, 1)
and that is not correct.
In pytorch the reshaping op is ordered from right to left.
You have to do a permutation
permute(0,2,1,3)
and then the reshaping
view(B,298,-1)
Basically this way the values
[0,0,0,0:257]-->[0,0,0:257]
[0,0,0,1:257]-->[0,0,257:257*2]
and so on.
Whay you are doing is mixing the data
You are putting
(B,8,298,257)
[0,0,0,0:257] --> [0,0,0:257]
[0,0,1,0:257] --> [0,0,257:298], then [0,1,0:298-257] and filling the reshaped tensor in a bad way
A possible fix:
In line #187 of models.py, replace output_layer = output_layer.view(batch_size,-1,height,1)
with output_layer = output_layer.permute(0, 2, 1, 3).contiguous().view(batch_size, height, -1, 1).permute(0, 2, 1, 3)
Hi,
Thanks for pointing out the mistake. It was fixed when the repo was merged to asteroid here, somehow it was not fixed here. But I did not see any significant improvement in terms of overfitting unfortunately.
Thanks for the quick response @vitrioil ! It seems another similar bug is repeated here and also in the asteroid. Here, we are reshaping (N,298,2x257xself.num_person) into (N,2,298,257,self.num_person). Instead of view(N, 2, 298, 257, self.num_person), it will be more appropriate to do view(N,298,2,257,self.num_person).transpose(1,2).
I see... we are mixing the axis at the last layer. Thank you again for pointing this out! I will fix this.
I have raised a bug in asteroid as well. Another thing, this line needs to be commented out, I guess.
please let know if you manage to overcome the overfitting problem with this fix. thanks!
Here input channel expected by self.conv1
is 512 so transposing the channels so that we get the correct expected channel, otherwise channels will still be 1. I don't have access to any GPU now but if I do train it and see an improvement I will let you know.
Hi, Firsly, thanks for the code. Is there any improvement in the overfitting problem? @vitrioil
Hi @MordehayM, I still haven't tried re-training, so currently the result is the same