contentvec
contentvec copied to clipboard
Loading the model for inference
Hi, thanks for your great work on contentvec!
I'm trying to setup the model just to get the representations, but I'm a little lost on how exactly it should be done. As you mentioned, we should be able to use the legacy model by just using plain fairseq, so I already setup fairseq and apex. But how should I properly load a wav file for example? You can find my code snippet below:
import fairseq
import soundfile
import torch
ckpt_path = "checkpoint_best_legacy_100.pt"
filedir = "sample.wav"
wav, sr = soundfile.read("sample.wav")
wav = torch.from_numpy(wav)
models, cfg, task = fairseq.checkpoint_utils.load_model_ensemble_and_task([ckpt_path])
model = models[0]
feats = model(wav)
print(feats.size())
But I'm getting an error as follows. I'm a little confused on why it should be a 3D input? If you can, could you provide a code snippet from loading the wav file up to getting the feature representations? Thank you very much.
Traceback (most recent call last):
File "contentvec.py", line 11, in <module>
feats = model(wav)
File "/home/toolkits/fairseq/ssl_env/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/toolkits/fairseq/fairseq/models/hubert/hubert.py", line 437, in forward
features = self.forward_features(source)
File "/home/toolkits/fairseq/fairseq/models/hubert/hubert.py", line 392, in forward_features
features = self.feature_extractor(source)
File "/home/toolkits/fairseq/ssl_env/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/toolkits/fairseq/fairseq/models/wav2vec/wav2vec2.py", line 895, in forward
x = conv(x)
File "/home/toolkits/fairseq/ssl_env/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/toolkits/fairseq/ssl_env/lib64/python3.6/site-packages/torch/nn/modules/container.py", line 117, in forward
input = module(input)
File "/home/toolkits/fairseq/ssl_env/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/toolkits/fairseq/ssl_env/lib64/python3.6/site-packages/torch/nn/modules/conv.py", line 257, in forward
self.padding, self.dilation, self.groups)
RuntimeError: Expected 3-dimensional input for 3-dimensional weight [512, 1, 10], but got 2-dimensional input of size [32825, 1] instead
It is best to post this question under fairseq. For feature extraction, please refer to the readme under Hubert. You can find step-by-step instructions there.
Hi @lesterphillip , I'm wondering if you have solved this problem. If yes, could you plz share your solution? Thank you very much!
@actuy Please refer to the Hubert feature extraction instructions under fairseq. This problem is independent of ContentVec. As long as one can extract features from the regular Hubert, one can extract features from ContentVec.
@actuy An alternative if you can't get the fairseq repo working is to use ESPNet and load the ContentVec model instead of the standard HuBERT model.
@auspicious3000 I think writing at least a guide in the README for how to navigate fairseq would be appreciated (since the fairseq repo isn't so straightforward to use), it would also help more people use your model and get your paper more citations if more people can use easily it. So I don't think that the "how to use" issue is really independent of ContentVec. Just a thought.
Cool! Thank you very much! I'll try it! @lesterphillip @auspicious3000 Thank you for your answer.
@lesterphillip Pointer to step-by-step instructions is updated.
@actuy Please use try to use fairseq to ensure correct output. This repo has been tested working by many other users. It should work without problems. Please open new issues if you can't get the fairseq repo working.