DNN-for-speech-enhancement Is it convenient for you to share the pretrain model with me

Hi Dr.Xu,

Is it convenient for you to share the pretrain model with me?

Jul 27 '17 07:07 wangjianfly2003

Hi, the initialized model was not pre-trained. Just with random initialization.

Jul 27 '17 13:07 yongxuUSTC

ok, it is here: https://github.com/yongxuUSTC/DNN-for-speech-enhancement/tree/master/toolbox/weights

source code for initializing your model weights randomly and change back the weights for matlab decoding

Jul 27 '17 13:07 yongxuUSTC

Thank you very much for your kindly reply, Dr.Xu. You means i don't need to do the pretrain process, and can get the speech enhancement effect like you provide in DNN_speech_enhancement_tool using only the fine tune process?

Jul 28 '17 05:07 wangjianfly2003

Yes, correct. Just with fine-tuning process with random initialization. I once tried RBM-based pre-training which did not work.

Jul 28 '17 08:07 yongxuUSTC

OK. i will try to train an new model with collected noisy data using the fine-tuning process you provide. Thank you very much.

From reading your decoding code, i guess you use noisy speech and noisy data as input feature, use clean speech and noisy as output feature to train the model you provide. Am i right? Besides, you use the normalized “timit_aurora4_115NT_7SNRS_each190_80uuts_noisy_lsp_be_random_linux_global_mv.mat” file to deal with the input noisy speech , however, i don't understand why you use this file to do DNN decoding , why not use the normalized output feature to do decoding?

Jul 28 '17 09:07 wangjianfly2003

The direct mapping is from noisy speech log-power spectra to clean speech log-power spectra. Additionally, you can also predict noise log-power spectra, ideal binary mask, or ideal ratio mask to do some post-processing.

The norm file is used both for training and decoding. In the decoding, you should normalize the input noisy feature, and transform the enhanced feature back to the normal scale using the norm file.

Jul 29 '17 11:07 yongxuUSTC

where do you find "“timit_aurora4_115NT_7SNRS_each190_80uuts_noisy_lsp_be_random_linux_global_mv.mat”" ?

I think i use a different one: https://drive.google.com/file/d/0B5r5bvRpQ5DRR1lIV1hpZ0RLQ0E/view

Jul 29 '17 11:07 yongxuUSTC

Hi Dr.XU. I made a mistake about the norm file. I tried the same norm file as you used.

In the "BP_GPU.cu" file, i think the code should be modified as below to make the output unit is linear, that is changed the second parameter from "cur_layer_y" to "cur_layer_x". cudaMemcpy(dev[0].out,cur_layer_x,n_framescur_layer_unitssizeof(float),cudaMemcpyDeviceToDevice);

Am i right?

Jul 31 '17 06:07 wangjianfly2003

You are right. cudaMemcpy(dev[0].out,cur_layer_x,n_framescur_layer_unitssizeof(float),cudaMemcpyDeviceToDevice);

I think i uploaded the code for ideal binary mask prediction. I commented the sigmoid code, but forgot to change "cur_layer_y" to "cur_layer_x".

I have updated the code.

Jul 31 '17 09:07 yongxuUSTC

please update "cv_bunch_single" func also

Jul 31 '17 09:07 yongxuUSTC

Hi Dr.Xu. Today i used noisy speech log-power spectra as input feature (50 TIMIT clean speech corrupted with 100 enviroment noise type with -5db SNR), clean speech log-power spectra as target feature to train the model, the learning rate is 0.0005, the layersize is 2827(257*11),2048,2048,2048,257, the weights is random initialization; the number of epoch is 35(the value of squared_err is decreased).Then i use the trained model to to decoding, but got a very poor effect, even can't hear the speech.

Could you tell me how to determine the cause of the problem?

the size of training set is too small? the decoding error is wrong? ...

Aug 01 '17 08:08 wangjianfly2003

Could you update the your "finetune_DNN_speech_enhancement_dropout_NAT.pl", "interface.cc" and "step1_DNNenh_for 16kHz.m" files for direct mapping model from noisy speech log-power spectra to clean speech log-power spectra. I think i only changed the above three files.

Aug 01 '17 09:08 wangjianfly2003

If you want to check your code, you can map from clean to clean, if it still does not work. That means your code has some problem. You should do inverse-fea-norm as i did in step1_DNNenh_for 16kHz.m. Please ref "step1_DNNenh_for 16kHz.m" for decoding. There is no problem in the decoding code.

Aug 01 '17 19:08 yongxuUSTC

Hi Dr,Xu. I mapped from clean to clean, it seems it still does not work. So i started to check the code, and found that the map from 11 frames of input feature to one frame of target feature is correct, but the input data of frame 5 and frame 10 in para->indata are the same , i also checked the frame 5 and frame 10 in dataori, which are not the same. So i think maybe there are something wrong in the following code: for(j =0; j<= cur_frame_of_sent - para->fea_context;j++){ for(i =0;i< para->fea_context;i++){ for(k=0;k< para->fea_dim;k++){ para->indata[sample_index[cur_sample]* para->layersizes[0] +k +i *para->fea_dim] = dataori[(frames_processed +j +i) (2+para->fea_dim) +k+2]; } } I think the sentence "para->indata[sample_index[cur_sample] para->layersizes[0] +k +i *para->fea_dim] = dataori[(frames_processed +j +i) (2+para->fea_dim) +k+2];" should be changed to "para->indata[sample_index[cur_sample] para->layersizes[0] +k +i para->fea_dim] = dataori[(frames_processed +j para->fea_context +i) *(2+para->fea_dim) +k+2]; Am i right?

Aug 02 '17 07:08 wangjianfly2003

i comment the following code in interface.cc file: /* i=i-1; for(k=129;k< 2*(para->fea_dim);k++){ para->indata[sample_index[cur_sample]* para->layersizes[0] +k +i *para->fea_dim] = (dataori[(frames_processed + 0) *(2+para->fea_dim) +(k-129)+2]+dataori[(frames_processed + 1) *(2+para->fea_dim) +(k-129)+2]+dataori[(frames_processed + 2) *(2+para->fea_dim) +(k-129)+2]+dataori[(frames_processed + 3) *(2+para->fea_dim) +(k-129)+2]+dataori[(frames_processed + 4) *(2+para->fea_dim) +(k-129)+2]+dataori[(frames_processed + 5) *(2+para->fea_dim) +(k-129)+2])/6.0f; } */

Aug 02 '17 08:08 wangjianfly2003

DNN-for-speech-enhancement DNN-for-speech-enhancement copied to clipboard

Is it convenient for you to share the pretrain model with me

DNN-for-speech-enhancement
DNN-for-speech-enhancement copied to clipboard