VibeVoice
VibeVoice copied to clipboard
How can we get the position of text in the generated audio?
It's really cool that we can now generate audio in realtime with microsoft/VibeVoice-Realtime-0.5B. I was thinking about integrating it to my application. And then I found a critical UX requirement, if we could highlight the text with the current audio that would be great.
Does vibe voice support this?
Thank you for your interest. Currently, the model cannot provide alignment information between generated speech and text.
Okay!
If you guys are willing to work on this one. I would be happy to help. Please let me know.
Good day!