Conv-TasNet
Conv-TasNet copied to clipboard
Bad performance when using for speech enhancement
Hi, very nice work. I noticed that some people are using Conv-TasNet for speech enhancement and get good results. While I encountered some problem while using this code for speech enhancement... I am trying to split clean speech and noise from a noisy speech. I am using VCTK dataset. The waveform of the results seem very weird...
When I changed the activation of mask to sigmoid, the result is still not good.
I wonder anyone has a thought how to solve this problem. Thanks in advance!
It seems to be caused by the choice of loss function, i.e., SI-SDR. SI-SDR does not restrict the magnitude of waveform, which may cause the the chopping effect. I think you can replace SI-SDR loss with other loss options like SNR or wave-L_1.
@Andong-Li-speech Hi, thanks for your suggestions! While the result seems still not very good after changing the loss function to SNR loss... But it works much better! I wonder if you are also working on this part, what kind of loss function are you using? Thanks a lot in advance!
@jkzhang7 Hi, do you get a better performance? I face the same problem now. Best wishes to you!
@LittleFlyingSheep Hi~ Did you solved this problem now? seem to meet the same problem , the magnitude of separate waveform is too big and sounds not very well, thanks a lot if you could give me some advice~
@forestlee95 One way I choose to solve it is to scale the waveform artificially. I choose the max value of the input noisy and divide it with the output. This method will get a relatively good performance. This is just my helpless action. If you have any other methods, please letter me.
@LittleFlyingSheep @jkzhang7 Hi, I am looking for the speech enhancement performance of conv-tasnet on vctk dataset, do you guys have any performance data about it? Much appreciated.
收到
Hi, very nice work. I noticed that some people are using Conv-TasNet for speech enhancement and get good results. While I encountered some problem while using this code for speech enhancement... I am trying to split clean speech and noise from a noisy speech. I am using VCTK dataset. The waveform of the results seem very weird...
When I changed the activation of mask to sigmoid, the result is still not good.
I wonder anyone has a thought how to solve this problem. Thanks in advance! How did you solve it?i meet the same bug while testing
收到