vggvox-speaker-identification icon indicating copy to clipboard operation
vggvox-speaker-identification copied to clipboard

about MFCC

Open TTTJJJWWW opened this issue 6 years ago • 6 comments

@linhdvu14 Hi, thanks for your code. I know you are using the model with weight from VGGVOX, but where is the MFCC process? Or you use different features?

TTTJJJWWW avatar Jan 01 '19 09:01 TTTJJJWWW

Hi, VGGVox doesn't use MFCC, only FFT spectrum. The signal processing code is in sigproc.py.

linhdvu14 avatar Jan 01 '19 15:01 linhdvu14

@linhdvu14 Hi,Thank you for your reply. I am doubtful about "VGGVox doesn't use MFCC", because the source code of VGGVOX contain the MFCC function(from MFCC folder) and use it : function [ SPEC ] = mfccspec( speech, fs, Tw, Ts, alpha, window, R, M, N, L ) % MFCC Mel frequency cepstral coefficient feature extraction. ...

TTTJJJWWW avatar Jan 03 '19 02:01 TTTJJJWWW

Yes but if you look at the code of mfccspec, the return value SPEC is only FFT.

linhdvu14 avatar Jan 03 '19 03:01 linhdvu14

Oh I see. So you mean that the features of wav are inputed in model as a image (grey-scale image)? And the system essentially calculates the similarity (distance) of the image?

TTTJJJWWW avatar Jan 03 '19 09:01 TTTJJJWWW

@linhdvu14 Hi, did the "weights.h5" store both the architecture and weights, or just weights? I want to convert to a TensorFlow model(.pd). Can I just use "keras_to_tensorflow" tools to do it? Look forward to your reply.

TTTJJJWWW avatar Jan 04 '19 09:01 TTTJJJWWW

It's just weights. You'd probably want to export both weights and architecture before trying keras_to_tensorflow. Or replicate the model architecture in tf and restore weights from a dict.

linhdvu14 avatar Jan 06 '19 01:01 linhdvu14