contentvec icon indicating copy to clipboard operation
contentvec copied to clipboard

Loading the model for inference

Open lesterphillip opened this issue 2 years ago • 1 comments

Hi, thanks for your great work on contentvec!

I'm trying to setup the model just to get the representations, but I'm a little lost on how exactly it should be done. As you mentioned, we should be able to use the legacy model by just using plain fairseq, so I already setup fairseq and apex. But how should I properly load a wav file for example? You can find my code snippet below:

import fairseq
import soundfile
import torch

ckpt_path = "checkpoint_best_legacy_100.pt"
filedir = "sample.wav"
wav, sr = soundfile.read("sample.wav")
wav = torch.from_numpy(wav)
models, cfg, task = fairseq.checkpoint_utils.load_model_ensemble_and_task([ckpt_path])
model = models[0]
feats = model(wav)
print(feats.size())

But I'm getting an error as follows. I'm a little confused on why it should be a 3D input? If you can, could you provide a code snippet from loading the wav file up to getting the feature representations? Thank you very much.

Traceback (most recent call last):
  File "contentvec.py", line 11, in <module>
    feats = model(wav)
  File "/home/toolkits/fairseq/ssl_env/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/toolkits/fairseq/fairseq/models/hubert/hubert.py", line 437, in forward
    features = self.forward_features(source)
  File "/home/toolkits/fairseq/fairseq/models/hubert/hubert.py", line 392, in forward_features
    features = self.feature_extractor(source)
  File "/home/toolkits/fairseq/ssl_env/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/toolkits/fairseq/fairseq/models/wav2vec/wav2vec2.py", line 895, in forward
    x = conv(x)
  File "/home/toolkits/fairseq/ssl_env/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/toolkits/fairseq/ssl_env/lib64/python3.6/site-packages/torch/nn/modules/container.py", line 117, in forward
    input = module(input)
  File "/home/toolkits/fairseq/ssl_env/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/toolkits/fairseq/ssl_env/lib64/python3.6/site-packages/torch/nn/modules/conv.py", line 257, in forward
    self.padding, self.dilation, self.groups)
RuntimeError: Expected 3-dimensional input for 3-dimensional weight [512, 1, 10], but got 2-dimensional input of size [32825, 1] instead

lesterphillip avatar Jul 31 '22 03:07 lesterphillip

It is best to post this question under fairseq. For feature extraction, please refer to the readme under Hubert. You can find step-by-step instructions there.

auspicious3000 avatar Jul 31 '22 05:07 auspicious3000

Hi @lesterphillip , I'm wondering if you have solved this problem. If yes, could you plz share your solution? Thank you very much!

actuy avatar Mar 27 '23 02:03 actuy

@actuy Please refer to the Hubert feature extraction instructions under fairseq. This problem is independent of ContentVec. As long as one can extract features from the regular Hubert, one can extract features from ContentVec.

auspicious3000 avatar Mar 27 '23 03:03 auspicious3000

@actuy An alternative if you can't get the fairseq repo working is to use ESPNet and load the ContentVec model instead of the standard HuBERT model.

@auspicious3000 I think writing at least a guide in the README for how to navigate fairseq would be appreciated (since the fairseq repo isn't so straightforward to use), it would also help more people use your model and get your paper more citations if more people can use easily it. So I don't think that the "how to use" issue is really independent of ContentVec. Just a thought.

lesterphillip avatar Mar 27 '23 13:03 lesterphillip

Cool! Thank you very much! I'll try it! @lesterphillip @auspicious3000 Thank you for your answer.

actuy avatar Mar 31 '23 03:03 actuy

@lesterphillip Pointer to step-by-step instructions is updated.

@actuy Please use try to use fairseq to ensure correct output. This repo has been tested working by many other users. It should work without problems. Please open new issues if you can't get the fairseq repo working.

auspicious3000 avatar Mar 31 '23 05:03 auspicious3000