sednn icon indicating copy to clipboard operation
sednn copied to clipboard

Pre-trained model usage

Open DinoTheDinosaur opened this issue 6 years ago • 7 comments

Hello! I've discovered that you've shared a pre-trained model on TIMIT dataset, but is it possible to run it on own data? I've managed to reuse the trained model, generated in the process of running sednn/mixture2clean_dnn/runme.sh. I trained the model on the subsample of LibriSpeech dataset, but I'm not too happy with the quality of enhancing. When I tried to substitute LibriSpeech model with your TIMIT model, I couldn't run the decoding due to the input shapes mismatch. So, my question is, how can I get to work the pre-trained TIMIT model on my own data in python? My assumpion is that default input data format differs in the clean2clean_verify and mixture2clean_dnn subdirectories. In that case where can I retrieve the information (scripts in the subdirectory sednn/clean2clean_verify/; papers)?

Best regards, Yakovenko Olga

DinoTheDinosaur avatar Jan 06 '19 09:01 DinoTheDinosaur

Hello! If you are training on LibriSpeech, please check the format of data is .wav and both the speech and the noise sampling rate is 16 kHz.

For the input shape mismatch error, please check if the input shape is correct. We are concatenating 11 frames as the input to the model. Please also do not forget to subtract the mean and divide the standard value of the input. The mean and standard value is calculated from the training data.

Best wishes,

Qiuqiang


From: DinoTheDinosaur [email protected] Sent: 06 January 2019 09:02:31 To: yongxuUSTC/sednn Cc: Subscribed Subject: [yongxuUSTC/sednn] Pre-trained model usage (#24)

Hello! I've discovered that you've shared a pre-trained model on TIMIT dataset, but is it possible to run it on own data? I've managed to reuse the trained model, generated in the process of running sednn/mixture2clean_dnn/runme.sh. I trained the model on the subsample of LibriSpeech dataset, but I'm not too happy with the quality of enhancing. When I tried to substitute LibriSpeech model with your TIMIT model, I couldn't run the decoding due to the input shapes mismatch. So, my question is, how can I get to work the pre-trained TIMIT model on my own data in python? My assumpion is that default input data format differs in the clean2clean_verify and mixture2clean_dnn subdirectories. In that case where can I retrieve the information (scripts in the subdirectory sednn/clean2clean_verify/; papers)?

Best regards, Yakovenko Olga

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/yongxuUSTC/sednn/issues/24, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMt5ydZwxkFamsbDdz5VNQOBpUg2MoXOks5vAbungaJpZM4ZyVLq.

qiuqiangkong avatar Jan 06 '19 10:01 qiuqiangkong

check the format of data is .wav and both the speech and the noise sampling rate is 16 kHz.

Yes, the data format is correct! The model works well on the test samples, but poorly on the real-life samples. That's why I also wanted to test your TIMIT model on the real-life samples - to compare the results.

We are concatenating 11 frames as the input to the model. Please also do not forget to subtract the mean and divide the standard value of the input.

Ok, I see! I will try it out with these parametres. Considering normalisation of the input - if I understand correctly, this functionality is covered by a scaler ('scaler.p' in mixture2clean_dnn or 'tr_norm.pickle' in clean2clean_verify). I tried to load the scaler with python 3 pickle, but I've encountered an error:

>>> AudioEnhancerTIMIT()
---------------------------------------------------------------------------
~/tools/sednn/mixture2clean_dnn/enhance_audio.py in __init__(self)
     56         def __init__(self):
     57                 self.model = load_model('model/sednn_keras_logMag_Relu2048layer1_1outFr_7inFr_dp0.2_weights.75-0.00.hdf5')
---> 58                 self.scaler = pickle.load(open('model/tr_norm.pickle', 'rb'))
     59                 self.n_window = 512
     60                 self.n_overlap = 256

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 0: ordinal not in range(128)

I heard there are sometimes issues with python's pickle files transfer between versions of python, maybe you have a newer version of scaler you can upload? Unfortunately, I have no access to the TIMIT database to calculate the scaler myself. Also, if I try to use the scaler extracted from LibriSpeech, I get a dimension mismatch:

~/tools/sednn/mixture2clean_dnn/enhance_audio.py in enhance_audio(self, speech_dir, output_dir, n_concat, n_hop)
     77		 # Scale data. 
     78		 if self.scale:
     79			 mixed_x = pp_data.scale_on_2d(mixed_x, self.scaler)
     80
     81		 # Cut input spectrogram to 3D segments with n_concat. 
     82	       	 mixed_x_3d = pp_data.mat_2d_to_3d(mixed_x, agg_num=n_concat, hop=1)
     83 
     84                 # Predict.
---> 85                 pred = self.model.predict(mixed_x_3d)
     86 
     87                 # Inverse scale.

/home/ds/anaconda3/envs/sednn_env/lib/python3.6/site-packages/keras/engine/training.py in predict(self, x, batch_size, verbose, steps)
   1147                              'argument.')
   1148         # Validate user data.
-> 1149         x, _, _ = self._standardize_user_data(x)
   1150         if self.stateful:
   1151             if x[0].shape[0] > batch_size and x[0].shape[0] % batch_size != 0:

/home/ds/anaconda3/envs/sednn_env/lib/python3.6/site-packages/keras/engine/training.py in _standardize_user_data(self, x, y, sample_weight, class_weight, check_array_lengths, batch_size)
    749             feed_input_shapes,
    750             check_batch_axis=False,  # Don't enforce the batch size.
--> 751             exception_prefix='input')
    752 
    753         if y is not None:

/home/ds/anaconda3/envs/sednn_env/lib/python3.6/site-packages/keras/engine/training_utils.py in standardize_input_data(data, names, shapes, check_batch_axis, exception_prefix)
    126                         ': expected ' + names[i] + ' to have ' +
    127                         str(len(shape)) + ' dimensions, but got array '
--> 128                         'with shape ' + str(data_shape))
    129                 if not check_batch_axis:
    130                     data_shape = data_shape[1:]

ValueError: Error when checking input: expected input_1 to have 2 dimensions, but got array with shape (249, 11, 257)

DinoTheDinosaur avatar Jan 08 '19 11:01 DinoTheDinosaur

We did not keep the trained model on our disk, as we thought it would be easy to run the code to get the model. If rerun the code on TIMIT dataset successful, on the test dataset a PESQ of 2.08 +- 0.24 can be obtained. Then you could move on to the real-life samples to see how it perform. As that you do not have access to TIMIT dataset, we will consider to release a latest trained model and scalar who do not have access to TIMIT dataset.

The code is developed using python 2. You may need import _pickle as cPickle or trying something fancy like: cPickle.load(f, encoding='latin1'). I found these solutions on Google.

Best wishes,

Qiuqiang


From: DinoTheDinosaur [email protected] Sent: 08 January 2019 11:08:39 To: yongxuUSTC/sednn Cc: Kong, Qiuqiang (PG/R - Elec Electronic Eng); Comment Subject: Re: [yongxuUSTC/sednn] Pre-trained model usage (#24)

check the format of data is .wav and both the speech and the noise sampling rate is 16 kHz. Yes, the data format is correct! The model works well on the test samples, but poorly on the real-life samples. That's why I also wanted to test your TIMIT model on the real-life samples - to compare the results.

We are concatenating 11 frames as the input to the model. Please also do not forget to subtract the mean and divide the standard value of the input. Ok, I see! I will try it out with these parametres. Considering normalisation of the input - if I understand correctly, this functionality is covered by a scaler ('scaler.p' in mixture2clean_dnn or 'tr_norm.pickle' in clean2clean_verify). I tried to load the scaler with python 3 pickle, but I've encountered an error:

AudioEnhancerTIMIT()


~/tools/sednn/mixture2clean_dnn/enhance_audio.py in init(self) 56 def init(self): 57 self.model = load_model('model/sednn_keras_logMag_Relu2048layer1_1outFr_7inFr_dp0.2_weights.75-0.00.hdf5') ---> 58 self.scaler = pickle.load(open('model/tr_norm.pickle', 'rb')) 59 self.n_window = 512 60 self.n_overlap = 256

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 0: ordinal not in range(128)

I heard there are sometimes issues with python's pickle files transfer between versions of python, maybe you have a newer version of scaler you can upload? Unfortunately, I have no access to the TIMIT database to calculate the scaler myself

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/yongxuUSTC/sednn/issues/24#issuecomment-452261240, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMt5ydJti7OpOe65tEQGHz9bQsHLYT1Fks5vBHw3gaJpZM4ZyVLq.

qiuqiangkong avatar Jan 08 '19 17:01 qiuqiangkong

We did not keep the trained model on our disk, as we thought it would be easy to run the code to get the model.

Yes, I understand! It is easy, thank you for making it that way :)

As that you do not have access to TIMIT dataset, we will consider to release a latest trained model and scalar who do not have access to TIMIT dataset.

Ok, that would be great!

The code is developed using python 2. You may need import _pickle as cPickle or trying something fancy like: cPickle.load(f, encoding='latin1').

Thank you, the second solution does work! Although later in the code I'm still met with the following error ValueError: Error when checking input: expected input_1 to have 2 dimensions, but got array with shape (249, 11, 257). It looks like there are some differences in the training procedure between clean2clean_verify and mixture2clean_dnn.

FIY, me and my team from Novosibirsk State University plan on making an open-source python package for noise supression in audio based on your repository (https://github.com/nsu-ai-team/noise_supression). There I already wrote a pipeline for single-file enhancement and I plan to upload LibriSpeech model (PESQ=1.82 +- 0.24), trained using your repo. Is it alright with you? Of course, I will add links and references to your repo.

DinoTheDinosaur avatar Jan 09 '19 04:01 DinoTheDinosaur

Hi, many thanks! That is absolutely fine!

Best wishes,

Qiuqiang


From: DinoTheDinosaur [email protected] Sent: 09 January 2019 04:38:07 To: yongxuUSTC/sednn Cc: Kong, Qiuqiang (PG/R - Elec Electronic Eng); Comment Subject: Re: [yongxuUSTC/sednn] Pre-trained model usage (#24)

We did not keep the trained model on our disk, as we thought it would be easy to run the code to get the model.

Yes, I understand! It is easy, thank you for making it that way :)

As that you do not have access to TIMIT dataset, we will consider to release a latest trained model and scalar who do not have access to TIMIT dataset.

Ok, that would be great!

The code is developed using python 2. You may need import _pickle as cPickle or trying something fancy like: cPickle.load(f, encoding='latin1').

Thank you, the second solution does work! Although later in the code I'm still met with the following error ValueError: Error when checking input: expected input_1 to have 2 dimensions, but got array with shape (249, 11, 257). It looks like there are some differences in the training procedure between clean2clean_verify and mixture2clean_dnn.

FIY, me and my team from Novosibirsk State University plan on making an open-source python package for noise supression in audio based on your repository (https://github.com/nsu-ai-team/noise_supression). There I already wrote a pipeline for single-file enhancement and I plan to upload LibriSpeech model (PESQ=1.82 +- 0.24), trained using your repo. Is it alright with you? Of course, I will add links and references to your repo.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/yongxuUSTC/sednn/issues/24#issuecomment-452569440, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMt5yaL7bJFKBWqWQ-QQuxUt_mXw7pKkks5vBXIvgaJpZM4ZyVLq.

qiuqiangkong avatar Jan 16 '19 23:01 qiuqiangkong

Hi, Yes, It is also fine for me.

Best regards, yong

On Wed, 16 Jan 2019 at 15:21, qiuqiangkong [email protected] wrote:

Hi, many thanks! That is absolutely fine!

Best wishes,

Qiuqiang


From: DinoTheDinosaur [email protected] Sent: 09 January 2019 04:38:07 To: yongxuUSTC/sednn Cc: Kong, Qiuqiang (PG/R - Elec Electronic Eng); Comment Subject: Re: [yongxuUSTC/sednn] Pre-trained model usage (#24)

We did not keep the trained model on our disk, as we thought it would be easy to run the code to get the model.

Yes, I understand! It is easy, thank you for making it that way :)

As that you do not have access to TIMIT dataset, we will consider to release a latest trained model and scalar who do not have access to TIMIT dataset.

Ok, that would be great!

The code is developed using python 2. You may need import _pickle as cPickle or trying something fancy like: cPickle.load(f, encoding='latin1').

Thank you, the second solution does work! Although later in the code I'm still met with the following error ValueError: Error when checking input: expected input_1 to have 2 dimensions, but got array with shape (249, 11, 257). It looks like there are some differences in the training procedure between clean2clean_verify and mixture2clean_dnn.

FIY, me and my team from Novosibirsk State University plan on making an open-source python package for noise supression in audio based on your repository (https://github.com/nsu-ai-team/noise_supression). There I already wrote a pipeline for single-file enhancement and I plan to upload LibriSpeech model (PESQ=1.82 +- 0.24), trained using your repo. Is it alright with you? Of course, I will add links and references to your repo.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub< https://github.com/yongxuUSTC/sednn/issues/24#issuecomment-452569440>, or mute the thread< https://github.com/notifications/unsubscribe-auth/AMt5yaL7bJFKBWqWQ-QQuxUt_mXw7pKkks5vBXIvgaJpZM4ZyVLq

.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/yongxuUSTC/sednn/issues/24#issuecomment-454982033, or mute the thread https://github.com/notifications/unsubscribe-auth/AFJj0nSK7f3YIUP5ijAk_SktWOraNLWmks5vD7PrgaJpZM4ZyVLq .

yongxuUSTC avatar Jan 16 '19 23:01 yongxuUSTC

So, I guess, there isn't a way to recover the parametres for the pretrained TIMIT model nowadays? If this is the case, it would be better to close the issue and wait for the uploads.

DinoTheDinosaur avatar Jan 18 '19 04:01 DinoTheDinosaur