sednn
sednn copied to clipboard
Some questions in terms of Features Extraction
Really impressed by your work and got a few questions in terms of how you process the data.
In pack_feature function, you convert the 2D array to 3D array (mat_2d_to_3d function). And use the 3D array as the input to DNN model. What is the reason to do this process? Why not just use 2D array as the input to the DNN model?
Many thanks, Robert
Hi Robert,
The 3D data has the shape of (batch_size, time_steps, freq_bins), for example, (500, 240, 64). So 3D is needed.
Best wishes,
Qiuqiang
From: RobertWan91 [email protected] Sent: 22 April 2018 05:48:33 To: yongxuUSTC/sednn Cc: Subscribed Subject: [yongxuUSTC/sednn] Some questions in terms of Features Extraction (#7)
Really impressed by your work and got a few questions in terms of how you process the data.
In pack_feature function, you convert the 2D array to 3D array (mat_2d_to_3d function). And use the 3D array as the input to DNN model. What is the reason to do this process? Why not just use 2D array as the input to the DNN model?
Many thanks, Robert
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/yongxuUSTC/sednn/issues/7, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMt5yZJMNGTwXBqyQpg02G0ZP_EzIkEkks5trAuhgaJpZM4Teuf8.
Hi Qiuqiang,
Thanks for your reply.
Normally, my understanding is that for each sample input, we have a 2D array (time_steps, freq_bins). Then we add more sample into and form a 3D array (batch_size, time_steps, freq_bins). However, in this code, you first converted each sample's 2D array into 3D array format and then packed 3D array of each sample. May I know the reason for this processing?
Also for the output 'y' in the DNN model, basically the same procedure was used to generate 3D array. But the central spectrogram was choose as 'y' in the DNN model. Why just choose the central spectrogram?
Many thanks, Robert
Hi Robert,
Please ignore the previous email, I confused your question with another repository ...
Usually we use a stacked window of frames as input to a dnn. With stacking frames usually performs better than a single frame. So we need mat_2d_to_3d to reshape to (training_samples, stack_num, freq_bins). For example, if we stack 11 frames as input, then we have (training_samples, stack_num, freq_bins)=(training_samples, 11, 257)
Best wishes,
Qiuqiang
From: RobertWan91 [email protected] Sent: 23 April 2018 04:34:06 To: yongxuUSTC/sednn Cc: Kong Q Mr (PG/R - Elec Electronic Eng); Comment Subject: Re: [yongxuUSTC/sednn] Some questions in terms of Features Extraction (#7)
Hi Qiuqiang,
Thanks for your reply.
Normally, my understanding is that for each sample input, we have a 2D array (time_steps, freq_bins). Then we add more sample into and form a 3D array (batch_size, time_steps, freq_bins). However, in this code, you first converted each sample's 2D array into 3D array format and then packed 3D array of each sample. May I know the reason for this processing?
Also for the output 'y' in the DNN model, basically the same procedure was used to generate 3D array. But the central spectrogram was choose as 'y' in the DNN model. Why just choose the central spectrogram?
Many thanks, Robert
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/yongxuUSTC/sednn/issues/7#issuecomment-383445045, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMt5ydBKS1EMN47ULE6V1JzEpTAwCoyFks5trUuugaJpZM4Teuf8.