mediapipe icon indicating copy to clipboard operation
mediapipe copied to clipboard

Text To Speech to Facial BlendShapes

Open GeorgeS2019 opened this issue 1 year ago • 11 comments

MediaPipe Solution (you are using)

Part: 2 => Face Blendshape: May 2023 ->? Part: 1 => Done: ARKit 52 blendshapes support request. June 2022 to April 2023 Completed

Programming language

c#

Are you willing to contribute it

Yes:

Describe the feature and the current behaviour/state

From the Modelling part using Godot
https://github.com/srcnalt/ReadyPlayerMe-Godot-Test/issues/1#issue-1713856035

Will this change the current API? How?

YES, additional non-conflicting API to the existing current API

Who will benefit with this feature?

Anyone who use MediaPipe BlendShape. It is NEXT STEP to Deep AI (Integrating Deep Audio to MediaPipe)

Please specify the use cases for this feature

User use ChatGPT or something similar to generate replies and this new feature translate the replies to speech with corresponding Avatar Blendshapes manipulation

Any Other info

No response

GeorgeS2019 avatar May 18 '23 11:05 GeorgeS2019

How the API looks Like ?

Given a ChatGPT or something similar from Google reply in text, the API will receive this string and output

  1. the corresponding facial blendshapes as Time coordinated list of Dictionary[ blendshapeName, blendshapeValueFloat]
  2. Voice (mp3 or WAV) that aligns with the blendshapeValues

GeorgeS2019 avatar May 18 '23 12:05 GeorgeS2019

I have done this feature in Unreal Engine, it is easy to implement It use PaddleLite + OvrLipSync .😄

endink avatar May 18 '23 15:05 endink

@endink This is just Part 2 of many parts ahead :-)

GeorgeS2019 avatar May 18 '23 15:05 GeorgeS2019

Agreed! It would be really exciting if blendshapes could be estimated and aligned with input audio clip.

I am currently working on a pipeline: user voice->speech recognition->chatgpt->text to speech->blendshapes. There exist many mature solutions except for the last stage (speech2blendshapes). Lipsync and face good can possibly do this, but have their limitations or problems. This feature will benefit the mediapipe community.

FishWoWater avatar May 19 '23 06:05 FishWoWater

Hello @GeorgeS2019 Thanks for raising this amazing feature request. We will discuss it internally and prioritise it in our roadmap. However, just a heads up, we are working in numerous fronts as of now hence this might get delayed.

ayushgdev avatar May 22 '23 07:05 ayushgdev

Now working, the BlendShape part in 8th Top Ranked Github Open source 3D game engine: Godot @srcnalt @kaiidams @SpookyCorgi @you-win @j20001970 Godot_v4 0 3-rc2_mono_win64_JU4OlmIfLZ

GeorgeS2019 avatar May 27 '23 15:05 GeorgeS2019

Hello @lu-wang-g, Could you please look into this amazing feature request? Thank you!!

kuaashish avatar Jun 06 '23 10:06 kuaashish

At I/O 2023, Google released the demo app, Talking Character (https://developers.googleblog.com/2023/05/generative-ai-talking-character.html), which IIUC fits exactly the use case described here. The Web demo is partially open sourced here. You can find useful pieces of components in the directory. There has also been a discussion of releasing the talking character pipeline through MediaPipe, but we don't have concrete plan yet.

@ayushgdev and @kuaashish, do we have ways to track user requests like this?

lu-wang-g avatar Jun 13 '23 06:06 lu-wang-g

+1

tiamy avatar Sep 20 '23 06:09 tiamy

We now have C# wrapper of Godot Mediapipe

GeorgeS2019 avatar Apr 29 '24 00:04 GeorgeS2019

The Godot community will attempt Text to Face => follow here

GeorgeS2019 avatar May 01 '24 05:05 GeorgeS2019