metavoice-src
metavoice-src copied to clipboard
few errors while loading tts in colab notebook
Hello team,
an interesting repo to experiment on voice. came across Few issue, while experimenting tts in colab.
- tyro, julius are available in poetry file and while the tts it ask to install again
- below is the screen shot where we unable to load the tts
Above 2 are from colab file.
looking for your reply..
Keep rocking..
@lucapericlp is looking into it
Hey @muralidhar972, thanks for the issue! We rolled out poetry support recently but we didn't update the colab link. I had a quick look at this but I'm struggling to get things running on a T4 since the runtime keeps getting disconnected. If you want to have a go until I'm able to further debug, I'm using this updated notebook to test things out. I'll post an update here once we've merged the finalised version into main.
Hey @lucapericlp, i actually tried your updated notebook but i have dependency issues between audiocraft and torch, torchvision etc.
Still not working, your's isn't either, when's the fix coming?
@lucapericlp Hey!
metavoice is amazing! :) Unfortunately the colab does not work for me. I always get this error:
TorchRuntimeError: Failed running call_function <built-in function scaled_dot_product_attention>(*(FakeTensor(..., device='cuda:0', size=(2, 16, s0, 128)), FakeTensor(..., device='cuda:0', size=(2, 16, 2048, 128), dtype=torch.float16), FakeTensor(..., device='cuda:0', size=(2, 16, 2048, 128), dtype=torch.float16)), **{'attn_mask': FakeTensor(..., device='cuda:0', size=(1, 1, s0, 2048), dtype=torch.bool), 'dropout_p': 0.0}):
Expected query, key, and value to have the same dtype, but got query.dtype: float key.dtype: c10::Half and value.dtype: c10::Half instead.
from user code:
File "/content/metavoice-src/fam/llm/fast_inference_utils.py", line 131, in prefill
logits = model(x, spk_emb, input_pos)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/content/metavoice-src/fam/llm/fast_model.py", line 160, in forward
x = layer(x, input_pos, mask)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/content/metavoice-src/fam/llm/fast_model.py", line 179, in forward
h = x + self.attention(self.attention_norm(x), mask, input_pos)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/content/metavoice-src/fam/llm/fast_model.py", line 222, in forward
y = F.scaled_dot_product_attention(q, k, v, attn_mask=mask, dropout_p=0.0)
Any advice would be highly appreciated!
Hey @sebastianrueckerai, thanks for the enthusiasm! Sorry that you're bumping into this issue, we're tracking it here: https://github.com/metavoiceio/metavoice-src/issues/108. There's a temporary fix in the issue linked there which you can try out in the meantime.