shekharmeena2896
shekharmeena2896
i can configure the tts of openai or elevenlabs to give me the audio in wav or pcm format, its realtime , i have the choice of streamin the audio...
ading images... 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00 Invalid number of channels in input image: > 'VScn::contains(scn)' > where > 'scn' is 1 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 229/229 [00:31
I see many tts todays like orpheus, dia tts and sesame ai tts, and maybe eleven labs. the all have language model in btw the architecture , that helps drive...
I would recommend not use the vits because its fairly old architecture , you should try Orpheus tts , use the base hindi model and finetune it on punjabi data