mediapipe
mediapipe copied to clipboard
Text To Speech to Facial BlendShapes
MediaPipe Solution (you are using)
Part: 2 => Face Blendshape: May 2023 ->? Part: 1 => Done: ARKit 52 blendshapes support request. June 2022 to April 2023 Completed
Programming language
c#
Are you willing to contribute it
Yes:
- using @srcnalt Ready Player Me Avatar RPM-Face-Tracing in Godot
- using @kaiidams TextToSpeech: Voice100Sharp
- using @SpookyCorgi mediapipe motion capture
- using @virtual-puppet-project speech to avatar mouth movements Virtual Puppet Project
Describe the feature and the current behaviour/state
From the Modelling part using Godot
https://github.com/srcnalt/ReadyPlayerMe-Godot-Test/issues/1#issue-1713856035
Will this change the current API? How?
YES, additional non-conflicting API to the existing current API
Who will benefit with this feature?
Anyone who use MediaPipe BlendShape. It is NEXT STEP to Deep AI (Integrating Deep Audio to MediaPipe)
Please specify the use cases for this feature
User use ChatGPT or something similar to generate replies and this new feature translate the replies to speech with corresponding Avatar Blendshapes manipulation
Any Other info
No response
How the API looks Like ?
Given a ChatGPT or something similar from Google reply in text, the API will receive this string and output
- the corresponding facial blendshapes as Time coordinated list of Dictionary[ blendshapeName, blendshapeValueFloat]
- Voice (mp3 or WAV) that aligns with the blendshapeValues
I have done this feature in Unreal Engine, it is easy to implement It use PaddleLite + OvrLipSync .😄
@endink This is just Part 2 of many parts ahead :-)
Agreed! It would be really exciting if blendshapes could be estimated and aligned with input audio clip.
I am currently working on a pipeline: user voice->speech recognition->chatgpt->text to speech->blendshapes
. There exist many mature solutions except for the last stage (speech2blendshapes). Lipsync and face good can possibly do this, but have their limitations or problems. This feature will benefit the mediapipe community.
Hello @GeorgeS2019 Thanks for raising this amazing feature request. We will discuss it internally and prioritise it in our roadmap. However, just a heads up, we are working in numerous fronts as of now hence this might get delayed.
Now working, the BlendShape part in 8th Top Ranked Github Open source 3D game engine: Godot
@srcnalt
@kaiidams
@SpookyCorgi
@you-win
@j20001970
Hello @lu-wang-g, Could you please look into this amazing feature request? Thank you!!
At I/O 2023, Google released the demo app, Talking Character (https://developers.googleblog.com/2023/05/generative-ai-talking-character.html), which IIUC fits exactly the use case described here. The Web demo is partially open sourced here. You can find useful pieces of components in the directory. There has also been a discussion of releasing the talking character pipeline through MediaPipe, but we don't have concrete plan yet.
@ayushgdev and @kuaashish, do we have ways to track user requests like this?
+1
We now have C# wrapper of Godot Mediapipe
The Godot community will attempt Text to Face => follow here