mediapipe
mediapipe copied to clipboard
How to parallelly process multiple outputs for the same one input timestamp?
I'm implementing 2 calculators for TTS (Text To Speech) and Audio2Face (both are real-time stream web socket APIs). Firstly TTSCalculator has a long text input, and it may successively output multiple audio clips, then each audio clip should be immediately converted to face blendshapes in Audio2FaceCalculator.
Currently MediaPipe Scheduler seems begin to process the first audio clip (in the 2nd graph node) only if all the audio clips are output (in the first graph node), instead of processing an audio clip immediately after produced one.
How can I process an audio clip in the 2nd node immediately after produced one in the first node?
@Redogame,
Could you please elaborate your query with complete details? If you can share any code change made, Or Can share exact support looking from us?
@kuaashish Thanks for your help.
Please refer to the below image for current solution, expected solution and my questions (in red).
i think your issue is more related to audio to face conversion.So you might want to look into it before relying on mediapipe from your pipeline it's obvious for the face generation you have to have some information about the landmarks and face's blend weights.So i am not sure if mediapipe is the right tool for this.
Thanks @sparshgarg23 . The pipeline of TTS + Audio2Face are very intuitive and straightforward, but they are not the key point, I just took them as an example to explain my questions. These questions remain even if replace with other parallel graph nodes such as FaceDetection (output multiple faces) + FaceLandmarkDetection (parallelly generate landmarks for each face).
So I think that it is a common problem for those pipelines in which successor node need to parallelly process outputs produced by previous node.
Back to my TTS + Audio2Face example, in order to accelerate the whole pipeline, I want to parallelly process each audio output once it is produced. It looks like this:
Input text:
Hi there, my name is tom. Nice to meet you.
Outputs may be multiple segments and each should be processed ASAP:
pcm audio for segment 1 (Hi there) --Audio2Face--> blendshape 1
pcm audio for segment 2 (my name is tom) --Audio2Face--> blendshape 2
pcm audio for segment 3 (Nice to meet you) --Audio2Face--> blendshape 3
Does MediaPipe Scheduler support this kind of scenario? Or should I integrate all algorithms (eg. TTS + Audio2Face) as one calculator (it will lost merits of modularity)?
Hello @fban-google,
Could you please help out here? Thank you!!
@lu-wang-g can you assign this to an appropriate PoC?