Matcha-TTS icon indicating copy to clipboard operation
Matcha-TTS copied to clipboard

About batch inference for multi-speakers

Open isjwdu opened this issue 10 months ago • 0 comments

Hello, thank you for your great work.

I would like to ask two questions:

  1. Regarding the problem of batch inference of different sentences from the same speaker. I am now using --file to read a txt file containing multiple lines (taking 4 lines as an example), and an error will be reported during inference:
File "/mnt/E/isjwdu/Matcha-TTS/matcha/models/components/text_encoder.py", line 403, in forward
     x = torch.cat([x, spks.unsqueeze(-1).repeat(1, 1, x.shape[-1])], dim=1)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 4 but got size 1 for tensor number 1 in the list.

Is the original code set to read only through a single line? Is there any recommended way if I want to reason about multiple sentences in batches?

  1. For batch inference of the same txt text containing different speakers and different sentences, are there any code modification suggestions and tips?

For example my txt:

p329-016|p329|the norsemen considered the rainbow as a bridge over which the gods passed from earth to their home in the sky.
p316-091|p316|there was no bad behavior.

I want to inference different audio files based on different speakers.

Looking forward to your reply

isjwdu avatar Apr 25 '24 14:04 isjwdu