blurry mouth and awkward lip glitch at the end of sentences.
I hope someone can provide with some help regarding this.
I'm currently using LiveTalking with acatar generated with MuseTalk. I tried using my own recorded video to generate a realistic looking avatar, however, I noticed a couple of "issues" when running it.
The first one is the image quality drop at the level of the mouth area. I actually took high quality videos to ensure it looks good, but when the lips are moving, that specific area seems to be of a lower quality and shows some blur. I remember reading that blur or smoothing is used at some level of the process, so I'm wondering if there was a way to prevent this. Is it because the image quality/size is big? I tried reducing the size and even changed fps to 25.
The second issue is about a glitch that seems to happen at the end of each sentence (ending with full stop or some other punctuations) and at the end of the whole speech. Although there is no speech happening during that short period between sentences or at the very end, the mouth remains open up awkwardly probably based on the last phonetic.
This is the part of the code I am using to send text to edgeTTS.
Is it something I am doing wrong in this code or is there some other config I need to change to force it to show the normal video where the mouth is closed during silences at the end of each text chunk?
for i, char in enumerate(msg): if first_comma == True and char in ",.!;:?": result = result + msg[lastpos : i + 1] lastpos = i + 1 if len(result) > 10: print(result) nerfreal.put_msg_txt(result) result = "" first_comma = False elif char in ".!;:?": result = result + msg[lastpos : i + 1] lastpos = i + 1 if len(result) > 10: print(result) nerfreal.put_msg_txt(result) result = "" result = result + msg[lastpos:]
Any advice or tip would be appretiated. Thank you
class EdgeTTS(BaseTTS): def txt_to_audio(self,msg): #voicename = "zh-CN-YunyangNeural" #man voicename = "zh-CN-XiaoxiaoNeural" #woman #voicename = "zh-CN-YunyangNeural" text = msg t = time.time() asyncio.new_event_loop().run_until_complete(self.__main(voicename,text)) print(f'-------edge tts time:{time.time()-t:.4f}s') if self.input_stream.getbuffer().nbytes<=0: #edgetts err print('edgetts err!!!!!') return
self.input_stream.seek(0)
t = time.time()
stream = self.__create_bytes_stream(self.input_stream)
print(f'-------edge create_bytes_stream time:{time.time()-t:.4f}s')
streamlen = stream.shape[0]
idx=0
while streamlen >= self.chunk and self.state==State.RUNNING:
self.parent.put_audio_frame(stream[idx:idx+self.chunk])
streamlen -= self.chunk
idx += self.chunk
#if streamlen>0: #skip last frame(not 20ms)
# self.queue.put(stream[idx:])
self.input_stream.seek(0)
self.input_stream.truncate()
attention:“#skip last frame(not 20ms)“
@Wazaki-Ou Have you found a solution for the blurry lips?