DNN-for-speech-enhancement
DNN-for-speech-enhancement copied to clipboard
framesBeforeSent[] does not read correctly
Dear Dr. Xu: I am using your model to train with my data. However, it seems that the framesBeforeSent[] does not read the correct number. In my understanding, the number in framesBeforeSent[] should be the number of frames before each sentence and should correspond with the sum of the number in lens file. However, I get big numbers like 1101260349 in framesBeforeSent[] although I only have 6379 frames. And because of that, the program ran into an endless loop in : while(cur_chunk_frames >= para->traincache ){ next_st = cur_frame_id -(cur_chunk_frames - para->traincache); if(next_st < total_frames){ chunk_frame_st[count_chunk] = next_st; count_chunk++; cur_chunk_frames = (cur_frame_id - next_st > para->fea_context -1)?(cur_frame_id - next_st - para->fea_context +1):0; } The reason that I feel frameBeforeSent was misread is that it cause the error "tails in target pfile and data pfile is not consistent". When I use my noisy.pfile and clean.pfile to train, this error pop up. However, when I check my noisy.pfile and clean.pfile, the tails are the same.
I tried to fix this bug by changing the calculation of offset in read_tail(fp_data, offset, total_sents, framesBeforeSent) but failed since I am not quite familiar with the data structure of pfile. So, can you help fix this bug?
Any help from you would be greatly appreciated!!
Dear Dr. Xu:: It seems that the fea_dim is not correct with the default setting. With the setting in get_pfile.pl, the fea_dim of the pfile should be 257 instead of 129. But when I change fea_dim to 257, other errors occur. Can you tell us how to control the fea_dim in the pfile and what's the exact meaning of fea_dim?
Hi, the fea_dim means the dimension of one frame feature, e.g., log-power spectra. If the sample rate is 16khz, and you use 512 STFT, your fea_dim is 257 If the sample rate is 8khz, and you use 256 STFT, your fea_dim is 129
Thank you very much for your reply! Can you share the source code of "Wav2LogSpec.exe“ and "WAV2RAW.exe"? It would help a lot for us to know the data structure in the lsp files!