MuseTalk icon indicating copy to clipboard operation
MuseTalk copied to clipboard

Making MuseTalk 40% faster

Open mvoodarla opened this issue 1 year ago • 8 comments

I've been pretty impressed with MuseTalk albeit some of its shortcomings and have been playing around with the model. Ended up doing a ton of optimizations that made it run 40% faster. Most of these revolved around how we load, store, and save video frames in memory during pre/post-processing which turns out to be pretty inefficient. To that end, my company Sieve is now hosting it at a rate that's cheaper than self-hosting on GCP!

We also fixed a couple quality issues around audio silences.

We wrote about the work here and would appreciate any feedback / areas of improvement the community has noticed around the model that might be worthwhile for us to check out!

You can also just run the model directly in this playground!

mvoodarla avatar Aug 20 '24 15:08 mvoodarla

I saw your blog,very nice jobs!,the prepocess is too long ,the teech low resolution is a big problem, can you show more detail how to solves this cons!

dubeno avatar Aug 21 '24 06:08 dubeno

Hi @mvoodarla , your blog is like a guidance towards making the model perfect. Do you mind guiding me how you tackled the hallucination problems from silent audio? just change the temperature or replace with a new whisper model? Appreciate it!

evan-zhao-thermofisher avatar Aug 23 '24 00:08 evan-zhao-thermofisher

Thanks for your work, i just wondering that you have train a new model or use the checkpoint and optimize the inference part? Looking forward your reply.

liuzysy avatar Aug 23 '24 02:08 liuzysy

Hey folks! Thanks for the notes here. We're still doing more active work around this model that we're turning into a high quality pipeline. More specifically, we're doing things like using CodeFormer to upscale, fixing how facial alignment is done, etc.

As per how we tackled hallucination in silent audio, one of the fixes involves first trying to detect the silent audio and then changing input parameters to MuseTalk in those moments to make the mouth shut. We hope to do a more technical post around all of these things soon!

mvoodarla avatar Aug 27 '24 02:08 mvoodarla

Look forward to it. @mvoodarla , you guys are doing a really meaningful work.

evan-zhao-thermofisher avatar Aug 27 '24 03:08 evan-zhao-thermofisher

Join our Discord! Happy to share more active updates there.

https://discord.com/invite/Pnh97rvRtD

mvoodarla avatar Aug 29 '24 04:08 mvoodarla

Where is the code updates? This looks more like an advertisement, because the blog doesn't have code.

aesanchezgh avatar May 13 '25 17:05 aesanchezgh

This looks like just advertising, it should be banned.

jesulo avatar Jul 21 '25 15:07 jesulo