Matcha-TTS
Matcha-TTS copied to clipboard
About batch inference for multi-speakers
Hello, thank you for your great work.
I would like to ask two questions:
- Regarding the problem of batch inference of different sentences from the same speaker. I am now using --file to read a txt file containing multiple lines (taking 4 lines as an example), and an error will be reported during inference:
File "/mnt/E/isjwdu/Matcha-TTS/matcha/models/components/text_encoder.py", line 403, in forward
x = torch.cat([x, spks.unsqueeze(-1).repeat(1, 1, x.shape[-1])], dim=1)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 4 but got size 1 for tensor number 1 in the list.
Is the original code set to read only through a single line? Is there any recommended way if I want to reason about multiple sentences in batches?
- For batch inference of the same txt text containing different speakers and different sentences, are there any code modification suggestions and tips?
For example my txt:
p329-016|p329|the norsemen considered the rainbow as a bridge over which the gods passed from earth to their home in the sky.
p316-091|p316|there was no bad behavior.
I want to inference different audio files based on different speakers.
Looking forward to your reply