sednn
sednn copied to clipboard
some question of pad_with_border
Hello: Really impressed by your work and got a few questions in terms of how you process the data.
Do pad_with_border mean this?
Many thanks, Nick
Sorry In addition, I would like to ask if I want to use this speech-enhanced system in the front of the ASR. How do I do this?
Many thanks, Nick
Hi Nick,
The picture you show is correct. pad_with_border simply extend the left and right border.
You may obtain enhanced speech from by running this code. Then ASR may apply post-hoc.
Best wishes,
Qiuqiang
From: Nickkk1124 [email protected] Sent: 24 April 2018 09:57:30 To: yongxuUSTC/sednn Cc: Subscribed Subject: Re: [yongxuUSTC/sednn] some question of pad_with_border (#8)
Sorry In addition, I would like to ask if I want to use this speech-enhanced system on the front of the asr. How do I do this?
Many thanks, Nick
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/yongxuUSTC/sednn/issues/8#issuecomment-383856941, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMt5ydHnaYUDLH5wENARAsUg_HJAvFJbks5truj6gaJpZM4ThGhz.
Hello Qiuqiang,
Mat_2d_to_3d is to convert features to (n_segs, n_concat, n_freq).
The center frame of the first round of stacking frames is t=1, and the center frame of the second round of stacking frames should not be t=2?
But as shown in the following figure, why is the center frame of the second round of stacking frames t=4?
Many thanks,
Nick
Hi Nick,
Yes, you can use the enhanced features for ASR. But maybe you should use retraining or joint-training of your backend acoustic model for ASR.
Good luck.
Best regards, yong
Dr. Yong XU https://sites.google.com/view/xuyong/home
From: Nickkk1124 Date: 2018-04-24 09:57 To: yongxuUSTC/sednn CC: Subscribed Subject: Re: [yongxuUSTC/sednn] some question of pad_with_border (#8) Sorry In addition, I would like to ask if I want to use this speech-enhanced system on the front of the asr. How do I do this? Many thanks, Nick — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.
Hi Yong,
Thank you for your replying! There are some questions I'd like to ask:
-
The "enhanced features for ASR" you mentioned, do you mean the magnitudes of log power spectrogram?
-
Do you think using recover enhanced wav as ASR input is feasible?
-
What would you recommend about applying the enhancement system to dealing with the environmental noise?
Many thanks, Nick
Hi Nick,
In the picture you draw, it is correct. center frame=1 and center frame=4 in your drawing. It also depends on the hop.
"The "enhanced features for ASR" you mentioned, do you mean the magnitudes of log power spectrogram?"
- It means either enhanced spectrogram or log power spectrogram.
"Do you think using recover enhanced wav as ASR input is feasible?"
It is feasible if the dataset is small. However bare in mind any speech denoising
- method will lose some information. Some work did a joint enhancement and recognition.
"What would you recommend about applying the enhancement system to dealing with the environmental noise?"
- I think applying on environmental noise should be fine, as long as the noise for training covers most environmental noise.
Best wishes,
Qiuqiang
From: Nickkk1124 [email protected] Sent: 24 April 2018 17:18:58 To: yongxuUSTC/sednn Cc: Kong Q Mr (PG/R - Elec Electronic Eng); Comment Subject: Re: [yongxuUSTC/sednn] some question of pad_with_border (#8)
Hi Yong,
Thank you for your replying! There are some questions I'd like to ask:
-
The "enhanced features for ASR" you mentioned, do you mean the magnitudes of log power spectrogram?
-
Do you think using recover enhanced wav as ASR input is feasible?
-
What would you recommend about applying the enhancement system to dealing with the environmental noise?
Many thanks, Nick
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/yongxuUSTC/sednn/issues/8#issuecomment-383992952, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMt5yahThNECOw9f22-pO8B3RIlbgshRks5tr1BxgaJpZM4ThGhz.
Hello Qiuqiang,
This is a great work. It will be of great help if you could elaborate on below points mentioned by you in above discussion.
"- method will lose some information. Some work did a joint enhancement and recognition."
I get the point of information loss. Can you please tell more about Joint enhancement and recognition?
Is it like two 2 DNN models interlinked or preprocessing and ASR.
Thank-you.
Hi Nick,
If speech enhancement and ASR are done separately, the ASR performance might be reduced. Because sometimes speech enhancement will also move out some useful information of a speech. However, if they are combined to a single neural network it might be helpful. For example, use speech enhancement as lower layer of a neural network and use ASR as a high layer neural network. The loss function can combine the ASR and speech enhancement. It is just my conjecture and I am not aware if there is such work or not.
Best wishes,
Qiuqiang
From: akshayaCap [email protected] Sent: 05 July 2018 12:12:45 To: yongxuUSTC/sednn Cc: Kong Q Mr (PG/R - Elec Electronic Eng); Comment Subject: Re: [yongxuUSTC/sednn] some question of pad_with_border (#8)
Hello Qiuqiang,
This is a great work. It will be of great help if you could elaborate on below points mentioned by you in above discussion.
"- method will lose some information. Some work did a joint enhancement and recognition."
I get the point of information loss. Can you please tell more about Joint enhancement and recognition?
Is it like two 2 DNN models interlinked or preprocessing and ASR.
Thank-you.
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/yongxuUSTC/sednn/issues/8#issuecomment-402689055, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMt5ybZnHdThX_lUbV1r7wLirIbLZnQuks5uDfStgaJpZM4ThGhz.
Hi Nick,
Yes, there are joint SE & ASR training papers: https://www.isca-speech.org/archive/interspeech_2014/i14_0616.html https://ieeexplore.ieee.org/abstract/document/7178797/
Best regards, yong
Yong XU https://sites.google.com/view/xuyong/home
From: qiuqiangkong Date: 2018-07-06 03:55 To: yongxuUSTC/sednn CC: yong xu @ seattle; Comment Subject: Re: [yongxuUSTC/sednn] some question of pad_with_border (#8) Hi Nick,
If speech enhancement and ASR are done separately, the ASR performance might be reduced. Because sometimes speech enhancement will also move out some useful information of a speech. However, if they are combined to a single neural network it might be helpful. For example, use speech enhancement as lower layer of a neural network and use ASR as a high layer neural network. The loss function can combine the ASR and speech enhancement. It is just my conjecture and I am not aware if there is such work or not.
Best wishes,
Qiuqiang
From: akshayaCap [email protected] Sent: 05 July 2018 12:12:45 To: yongxuUSTC/sednn Cc: Kong Q Mr (PG/R - Elec Electronic Eng); Comment Subject: Re: [yongxuUSTC/sednn] some question of pad_with_border (#8)
Hello Qiuqiang,
This is a great work. It will be of great help if you could elaborate on below points mentioned by you in above discussion.
"- method will lose some information. Some work did a joint enhancement and recognition."
I get the point of information loss. Can you please tell more about Joint enhancement and recognition?
Is it like two 2 DNN models interlinked or preprocessing and ASR.
Thank-you.
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/yongxuUSTC/sednn/issues/8#issuecomment-402689055, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMt5ybZnHdThX_lUbV1r7wLirIbLZnQuks5uDfStgaJpZM4ThGhz. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
Dear Yong, " Yes, there are joint SE & ASR training papers: https://www.isca-speech.org/archive/interspeech_2014/i14_0616.html https://ieeexplore.ieee.org/abstract/document/7178797/ " It was an informative read. It would be great if you could post a link to its implementation (source code)
Thank-you, Akshaya