GeneFace icon indicating copy to clipboard operation
GeneFace copied to clipboard

Does the May's pretrained model support chinese audio?

Open lokvke opened this issue 2 years ago • 6 comments

hi there, i have a question as the tile says "does the May's pretrained model support chinese audio?", i tried to use May's pretrained model and a chinese audio file, but the output video seems that the lip of May doesn't match the audio.

look forward to your reply

lokvke avatar Nov 10 '23 01:11 lokvke

一个建议是 landmark3d-sync 是lrs3预训练的,这个唇形对中文的拟合有一些问题。建议你用纯中文数据走一遍geneface全部流程

aizhiqi-work avatar Nov 10 '23 07:11 aizhiqi-work

一个建议是 landmark3d-sync 是lrs3预训练的,这个唇形对中文的拟合有一些问题。建议你用纯中文数据走一遍geneface全部流程

请教一下,中文有哪些类似lsr3的数据集可以用?因为我找到lrw-1000,但没有全脸,只有嘴部的截图,与lsr还不太一样,无法提取landmark信息。

jack139 avatar Nov 16 '23 09:11 jack139

你是对的,我记得lrw-1000算是较大的中文数据。一个建议是换个思路,清华最近开源的一些多模态中文数据可以考虑下,CN-CVS,AV-CNCELEB类似的,规模都蛮大的,有一定参考价值。

aizhiqi-work avatar Nov 20 '23 05:11 aizhiqi-work

其实预训练的模型对中文的支持还可以,这是我从postnet开始训练的效果:

https://github.com/yerfor/GeneFace/assets/12045814/0c4ddc37-63a1-4609-b7bb-00d876eb2ec8

jinqiupeter avatar Nov 20 '23 06:11 jinqiupeter

看起来唇形同步好差,其实从指标上英语的唇形同步也不算特别惊艳,不过这个工作还是非常值得follow的

aizhiqi-work avatar Nov 20 '23 09:11 aizhiqi-work

@jinqiupeter 请教下,克隆声音用的那个模型?感觉声音克隆挺自然的。

CatherineZhou avatar Mar 06 '24 06:03 CatherineZhou