vggvox-speaker-identification
vggvox-speaker-identification copied to clipboard
about MFCC
@linhdvu14 Hi, thanks for your code. I know you are using the model with weight from VGGVOX, but where is the MFCC process? Or you use different features?
Hi, VGGVox doesn't use MFCC, only FFT spectrum. The signal processing code is in sigproc.py
.
@linhdvu14 Hi,Thank you for your reply. I am doubtful about "VGGVox doesn't use MFCC", because the source code of VGGVOX contain the MFCC function(from MFCC folder) and use it : function [ SPEC ] = mfccspec( speech, fs, Tw, Ts, alpha, window, R, M, N, L ) % MFCC Mel frequency cepstral coefficient feature extraction. ...
Yes but if you look at the code of mfccspec
, the return value SPEC
is only FFT.
Oh I see. So you mean that the features of wav are inputed in model as a image (grey-scale image)? And the system essentially calculates the similarity (distance) of the image?
@linhdvu14 Hi, did the "weights.h5" store both the architecture and weights, or just weights? I want to convert to a TensorFlow model(.pd). Can I just use "keras_to_tensorflow" tools to do it? Look forward to your reply.
It's just weights. You'd probably want to export both weights and architecture before trying keras_to_tensorflow
. Or replicate the model architecture in tf and restore weights from a dict.