UEAzSpeech icon indicating copy to clipboard operation
UEAzSpeech copied to clipboard

Can I get viseme animation data?

Open metakkh opened this issue 1 year ago • 6 comments

Hi, Can I get viseme data for 3D characters facial animation?

https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-speech-synthesis-viseme?tabs=3dblendshapes&pivots=programming-language-cpp#3d-blend-shapes-animation

image

I checked the viseme received value here, I confirmed that the other values ​​were received correctly.

image

But, Viseme Data Aniamtion value is empty. Are there any other settings to get that value?

Also, if you know how to connect the value to the metahuman's blendshape, I would appreciate your help.

metakkh avatar Sep 25 '23 05:09 metakkh

you can use SSML to soundwave instead of text to soundwave,and set the viseme type as "FacialExpression" in your SSML string. below is a valid SSML example to get blendshape data. <mstts:viseme type="FacialExpression"/> Rainbow has seven colors: Red, orange, yellow, green, blue, indigo, and violet.

anyway, it might not be a good idea to drive a lipsync animation by 55 blendshapes unless your GPU is strong enough, I can't get any acceptable performance in my laptop(rtx3070,8G),so I have to give up and switch to using visemeID.

skysworder avatar Oct 14 '23 00:10 skysworder

As skyworder said, to get the blendshapes, you'll need a SSML data with the mstts:viseme input 😁

https://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-synthesis-markup-structure#viseme-element

lucoiso avatar Oct 14 '23 10:10 lucoiso

Following your guidance, I obtained the blendshape data, but how can I use this data to enable metahuman to implement lipsync? Hope to get some guidance.

JiangHaiWei avatar Nov 12 '23 11:11 JiangHaiWei

Following your guidance, I obtained the blendshape data, but how can I use this data to enable metahuman to implement lipsync? Hope to get some guidance.

I've tried serval times to drive lipsync by using multi blend pose node, but not work at all. so I shifted to using viseme ID and offset time, it works well. here's main idea:

  1. you need to build a pose asset compare with azure viseme ID(22 poses), let's name it as "az_viseme_poseAsset",this is easy because metahuman has a PoseLibrary for visemes under common/common folder, you can pick out what you need carefully.
  2. Define a enumeration to list those 22 visemes,let's call it as azVisemeID.
  3. Add blend pose(azVisemeID) in the anim graph which drive face animation,and don't forget add all pins as blend channels.
  4. Add Evaluate pose az_viseme_poseAsset in the same graph with step 3,and convert this node to Pose by Name,Duplicate this node by 21 times to match the count of visemeID.Modify pose name for each of them,make sure the names should available in az_viseme_poseAsset.
  5. Now you can connect each viseme pose to blend pose(azVisemeID) by same order.
  6. Upadate the Active enum value of blend pose(azVisemeID) in Event Tick of level blueprint.

skysworder avatar Nov 15 '23 06:11 skysworder

Can you give detail explanation i am also trying on that

Ale3274 avatar Apr 30 '24 06:04 Ale3274

here's an example blueprint animgraph(in face_animBP of your metahuman),notice I use a viseme pose asset(face_visemes_lib_PoseAsset) with Oculus OVRlips naming-convention instead of use Azure viseme ID number.For blend Poses, You need to create an enumerate data with Azure viseme ID naming-convention(0-21) and name it as azVisemeID or any others at first,otherwise you won't find 'blend poses(azVisemeID)' node in 17151347371228 here's a fragment of level blueprint for event begin,to get the visemeID data and offset time. 17151347869035 here's the graph for event ticks in level blueprint,to set the active viseme ID on offset time. 17151348301498

skysworder avatar May 08 '24 02:05 skysworder