mediapipe icon indicating copy to clipboard operation
mediapipe copied to clipboard

How to parallelly process multiple outputs for the same one input timestamp?

Open fwwucn opened this issue 1 year ago • 6 comments

I'm implementing 2 calculators for TTS (Text To Speech) and Audio2Face (both are real-time stream web socket APIs). Firstly TTSCalculator has a long text input, and it may successively output multiple audio clips, then each audio clip should be immediately converted to face blendshapes in Audio2FaceCalculator.

Currently MediaPipe Scheduler seems begin to process the first audio clip (in the 2nd graph node) only if all the audio clips are output (in the first graph node), instead of processing an audio clip immediately after produced one.

How can I process an audio clip in the 2nd node immediately after produced one in the first node?

fwwucn avatar Jun 15 '23 04:06 fwwucn

@Redogame,

Could you please elaborate your query with complete details? If you can share any code change made, Or Can share exact support looking from us?

kuaashish avatar Jun 15 '23 08:06 kuaashish

@kuaashish Thanks for your help. Please refer to the below image for current solution, expected solution and my questions (in red). IMG_export_20230616_092625133

fwwucn avatar Jun 15 '23 10:06 fwwucn

i think your issue is more related to audio to face conversion.So you might want to look into it before relying on mediapipe from your pipeline it's obvious for the face generation you have to have some information about the landmarks and face's blend weights.So i am not sure if mediapipe is the right tool for this.

sparshgarg23 avatar Jun 16 '23 04:06 sparshgarg23

Thanks @sparshgarg23 . The pipeline of TTS + Audio2Face are very intuitive and straightforward, but they are not the key point, I just took them as an example to explain my questions. These questions remain even if replace with other parallel graph nodes such as FaceDetection (output multiple faces) + FaceLandmarkDetection (parallelly generate landmarks for each face).

So I think that it is a common problem for those pipelines in which successor node need to parallelly process outputs produced by previous node.

Back to my TTS + Audio2Face example, in order to accelerate the whole pipeline, I want to parallelly process each audio output once it is produced. It looks like this:

Input text:

Hi there, my name is tom. Nice to meet you.

Outputs may be multiple segments and each should be processed ASAP:

pcm audio for segment 1 (Hi there) --Audio2Face--> blendshape 1
pcm audio for segment 2 (my name is tom) --Audio2Face--> blendshape 2
pcm audio for segment 3 (Nice to meet you) --Audio2Face--> blendshape 3

Does MediaPipe Scheduler support this kind of scenario? Or should I integrate all algorithms (eg. TTS + Audio2Face) as one calculator (it will lost merits of modularity)?

fwwucn avatar Jun 16 '23 09:06 fwwucn

Hello @fban-google,

Could you please help out here? Thank you!!

kuaashish avatar Jun 19 '23 10:06 kuaashish

@lu-wang-g can you assign this to an appropriate PoC?

fban-google avatar Jun 21 '23 00:06 fban-google