rsrgan
rsrgan copied to clipboard
performance degrade using aishell as training data of feature-mapping
I tried lstm/res-lstm/gan-res-lstm, use the same configurations, all experiments got performance degrading. I don't know what's wrong. The back-end asr system is tdnn+lstm. front-end is feature mapping from aishell_train_clean+rvb to aishell_train_clean . Do you have any insights ? thank you very much !
You mean LSTM front-end is worse than DNN front-end? If yes, maybe there are some "stupid " mistakes. For example, the output of the front-end is normalized. If your AM's input is raw feature, you should do reverse CMVN first before you feed the dereverberated feature to AM.
If you mean 4-layer LSTM-Res is a little worse than 4 layer LSTM or 4-layer GAN-LSTM-Res is a little worse than LSTM-Res. Maybe you should adjust some hype-parameters, such as dropout rate, initial learning rate, l2_scale and so on. Moreover, at test stage, seting "moving_average=True" maybe very helpful.
my baseline is no front-end.
I did anti-gcmvn then use LDA
This work is for front-end speech dereverberation and the back-end AM is fixed. If your baseline is not front-end based dereverberation, how did you design your experiments?
My experiment is as follow: AM1 is trained using augmented dataset data_sp_rvb_vol as trainset; testset contains clean and rvb testset; get result1; trainset and testset both go through the feature-mapping frontend, and AM2 is trained and tested and get result2; I want result2 to be better than result1. does it make sense ?
Suppose your AM2 is trained on the data that go through the feature-mapping front-end. The rvb test set results on AM2 should better than AM1. But the clean set go through the front-end, and test on AM2, the results maybe will worse than AM1.
Actually, you needn't train a new AM2, just test on AM1. It's OK.
Oh, sorry i made a mistake. I do not retrain a AM2. all the testset go through the same acoustic model AM1; result1 without frontend, and result2 with the feature mapping frontend. result2 is worse than result1, even in rvb testset.
If your conclusion is that feature mapping is useless in speech dereverberation. I suggest you read 《Y. Xu, J. Du, L.-R. Dai, and C.-H. Lee, “An experimental study on speech enhancement based on deep neural networks,” IEEE Signal processing letters, vol. 21, no. 1, pp. 65–68, 2014.》firstly. I have verified this framework many times on different corpora.
I do think feature mapping make sense, so I am confused by my results. orz...
thank you anyway ~