mediapipe Text To Speech to Facial BlendShapes

MediaPipe Solution (you are using)

Part: 2 => Face Blendshape: May 2023 ->? Part: 1 => Done: ARKit 52 blendshapes support request. June 2022 to April 2023 Completed

Programming language

c#

Are you willing to contribute it

Yes:

using @srcnalt Ready Player Me Avatar RPM-Face-Tracing in Godot
using @kaiidams TextToSpeech: Voice100Sharp
using @SpookyCorgi mediapipe motion capture
using @virtual-puppet-project speech to avatar mouth movements Virtual Puppet Project

Describe the feature and the current behaviour/state

From the Modelling part using Godot
https://github.com/srcnalt/ReadyPlayerMe-Godot-Test/issues/1#issue-1713856035

Will this change the current API? How?

YES, additional non-conflicting API to the existing current API

Who will benefit with this feature?

Anyone who use MediaPipe BlendShape. It is NEXT STEP to Deep AI (Integrating Deep Audio to MediaPipe)

Please specify the use cases for this feature

User use ChatGPT or something similar to generate replies and this new feature translate the replies to speech with corresponding Avatar Blendshapes manipulation

Any Other info

No response

May 18 '23 11:05 GeorgeS2019

How the API looks Like ?

Given a ChatGPT or something similar from Google reply in text, the API will receive this string and output

the corresponding facial blendshapes as Time coordinated list of Dictionary[ blendshapeName, blendshapeValueFloat]
Voice (mp3 or WAV) that aligns with the blendshapeValues

May 18 '23 12:05 GeorgeS2019

I have done this feature in Unreal Engine, it is easy to implement It use PaddleLite + OvrLipSync .😄

May 18 '23 15:05 endink

@endink This is just Part 2 of many parts ahead :-)

May 18 '23 15:05 GeorgeS2019

Agreed! It would be really exciting if blendshapes could be estimated and aligned with input audio clip.

I am currently working on a pipeline: user voice->speech recognition->chatgpt->text to speech->blendshapes. There exist many mature solutions except for the last stage (speech2blendshapes). Lipsync and face good can possibly do this, but have their limitations or problems. This feature will benefit the mediapipe community.

May 19 '23 06:05 FishWoWater

Hello @GeorgeS2019 Thanks for raising this amazing feature request. We will discuss it internally and prioritise it in our roadmap. However, just a heads up, we are working in numerous fronts as of now hence this might get delayed.

May 22 '23 07:05 ayushgdev

Now working, the BlendShape part in 8th Top Ranked Github Open source 3D game engine: Godot @srcnalt @kaiidams @SpookyCorgi @you-win @j20001970 Godot_v4 0 3-rc2_mono_win64_JU4OlmIfLZ

May 27 '23 15:05 GeorgeS2019

Hello @lu-wang-g, Could you please look into this amazing feature request? Thank you!!

Jun 06 '23 10:06 kuaashish

At I/O 2023, Google released the demo app, Talking Character (https://developers.googleblog.com/2023/05/generative-ai-talking-character.html), which IIUC fits exactly the use case described here. The Web demo is partially open sourced here. You can find useful pieces of components in the directory. There has also been a discussion of releasing the talking character pipeline through MediaPipe, but we don't have concrete plan yet.

@ayushgdev and @kuaashish, do we have ways to track user requests like this?

Jun 13 '23 06:06 lu-wang-g

+1

Sep 20 '23 06:09 tiamy

We now have C# wrapper of Godot Mediapipe

Apr 29 '24 00:04 GeorgeS2019

The Godot community will attempt Text to Face => follow here

May 01 '24 05:05 GeorgeS2019

mediapipe mediapipe copied to clipboard

Text To Speech to Facial BlendShapes

MediaPipe Solution (you are using)

Programming language

Are you willing to contribute it

Describe the feature and the current behaviour/state

Will this change the current API? How?

Who will benefit with this feature?

Please specify the use cases for this feature

Any Other info

How the API looks Like ?

mediapipe
mediapipe copied to clipboard