VAD
VAD copied to clipboard
Questions understanding bdnn_transform function.
Hi.
I'm trying to rewrite this project in C++ in search of better interoperability, better user friendliness and better performance.
Now I successfully implemented MRCG extraction and get a huge quality boost as well as a small memory usage. However I have some problem understanding the scripts that does the prediction. This script involves lots of array allocating and I want to know the purpose of every single line in order to write better implementation.
So, could you please kindly give an explanation of the bdnn_transform function?
def bdnn_transform(inputs, w, u):
# """
# :param inputs. shape = (batch_size, feature_size)
# :param w : decide neighbors
# :param u : decide neighbors
# :return: trans_inputs. shape = (batch_size, feature_size*len(neighbors))
# """
neighbors_1 = np.arange(-w, -u, u)
neighbors_2 = np.array([-1, 0, 1])
neighbors_3 = np.arange(1+u, w+1, u)
neighbors = np.concatenate((neighbors_1, neighbors_2, neighbors_3), axis=0)
pad_size = 2*w + inputs.shape[0]
pad_inputs = np.zeros((pad_size, inputs.shape[1]))
pad_inputs[0:inputs.shape[0], :] = inputs
trans_inputs = [
np.roll(pad_inputs, -1*neighbors[i], axis=0)[0:inputs.shape[0], :]
for i in range(neighbors.shape[0])]
trans_inputs = np.asarray(trans_inputs)
trans_inputs = np.transpose(trans_inputs, [1, 0, 2])
trans_inputs = np.reshape(trans_inputs, (trans_inputs.shape[0], -1))
return trans_inputs
Thanks in advance.
Excellent! thank you for your interest and contributions!
Because it has been a long time since I implemented it, I can't exactly remember it in detail. However, the purpose is, implementing equation (7) in [1]. Also, it will be helpful to refer Fig. 2 in [1].
If there is some spare time for me, I can analyze the written code in detail, however, these day, I'm too busy. Thank you!
[1] X. Zhang and D. Wang, "Boosting Contextual Information for Deep Neural Network Based Voice Activity Detection," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, no. 2, pp. 252-264, Feb. 2016.