rsrgan icon indicating copy to clipboard operation
rsrgan copied to clipboard

performance degrade using aishell as training data of feature-mapping

Open opencvbaby opened this issue 6 years ago • 9 comments

I tried lstm/res-lstm/gan-res-lstm, use the same configurations, all experiments got performance degrading. I don't know what's wrong. The back-end asr system is tdnn+lstm. front-end is feature mapping from aishell_train_clean+rvb to aishell_train_clean . Do you have any insights ? thank you very much !

opencvbaby avatar Oct 22 '18 08:10 opencvbaby

You mean LSTM front-end is worse than DNN front-end? If yes, maybe there are some "stupid " mistakes. For example, the output of the front-end is normalized. If your AM's input is raw feature, you should do reverse CMVN first before you feed the dereverberated feature to AM.

If you mean 4-layer LSTM-Res is a little worse than 4 layer LSTM or 4-layer GAN-LSTM-Res is a little worse than LSTM-Res. Maybe you should adjust some hype-parameters, such as dropout rate, initial learning rate, l2_scale and so on. Moreover, at test stage, seting "moving_average=True" maybe very helpful.

wangkenpu avatar Oct 22 '18 12:10 wangkenpu

my baseline is no front-end.
I did anti-gcmvn then use LDA

opencvbaby avatar Oct 24 '18 03:10 opencvbaby

This work is for front-end speech dereverberation and the back-end AM is fixed. If your baseline is not front-end based dereverberation, how did you design your experiments?

wangkenpu avatar Oct 25 '18 02:10 wangkenpu

My experiment is as follow: AM1 is trained using augmented dataset data_sp_rvb_vol as trainset; testset contains clean and rvb testset; get result1; trainset and testset both go through the feature-mapping frontend, and AM2 is trained and tested and get result2; I want result2 to be better than result1. does it make sense ?

opencvbaby avatar Oct 26 '18 14:10 opencvbaby

Suppose your AM2 is trained on the data that go through the feature-mapping front-end. The rvb test set results on AM2 should better than AM1. But the clean set go through the front-end, and test on AM2, the results maybe will worse than AM1.

Actually, you needn't train a new AM2, just test on AM1. It's OK.

wangkenpu avatar Oct 27 '18 01:10 wangkenpu

Oh, sorry i made a mistake. I do not retrain a AM2. all the testset go through the same acoustic model AM1; result1 without frontend, and result2 with the feature mapping frontend. result2 is worse than result1, even in rvb testset.

opencvbaby avatar Oct 29 '18 09:10 opencvbaby

If your conclusion is that feature mapping is useless in speech dereverberation. I suggest you read 《Y. Xu, J. Du, L.-R. Dai, and C.-H. Lee, “An experimental study on speech enhancement based on deep neural networks,” IEEE Signal processing letters, vol. 21, no. 1, pp. 65–68, 2014.》firstly. I have verified this framework many times on different corpora.

wangkenpu avatar Oct 30 '18 00:10 wangkenpu

I do think feature mapping make sense, so I am confused by my results. orz...

opencvbaby avatar Oct 31 '18 07:10 opencvbaby

thank you anyway ~

opencvbaby avatar Oct 31 '18 07:10 opencvbaby