MuseTalk 感觉引用vae始终是个问题，不能百分百复原脸部细节

中间是原素材，最右边是musetalk的效果，能明显看到差距

Apr 03 '25 08:04 mohui37

https://github.com/user-attachments/assets/3ab276c7-b118-49ee-9abd-dc364328e67a

Apr 03 '25 10:04 mohui37

https://github.com/user-attachments/assets/8cce0989-9ab7-4023-a8e3-c1f3db4109b4

Apr 03 '25 10:04 mohui37

04034_0403_expfinetune.mp4

感谢关注！请问这个结果使用哪个模型跑的？

Apr 03 '25 14:04 zzzweakman

中间是原素材，最右边是musetalk的效果，能明显看到差距

我们针对这个问题做过实验，第一行的最左列是原图，其余四列分别是SD1.5的VAE（4通道）、SDXL的VAE（4通道）、SD3的VAE（16通道）、Flux的VAE（16通道）对原图进行重建；第二行是原图与重建图像的残差（放大四倍。

用更强的VAE也许能够缓解细节损失的问题

Apr 03 '25 14:04 zzzweakman

04034_0403_expfinetune.mp4

感谢关注！请问这个结果使用哪个模型跑的？

这个是我们自研的微调模型的效果，并不是通用的，不好与musetalk比较的

Apr 03 '25 14:04 mohui37

The middle one is the original material, and the rightmost one is the effect of musetalk. You can clearly see the difference.

We conducted experiments on this problem. The leftmost column of the first row is the original image, and the remaining four columns are the reconstructions of the original image by SD1.5 VAE (4 channels), SDXL VAE (4 channels), SD3 VAE (16 channels), and Flux VAE (16 channels). The second row is the residual between the original image and the reconstructed image (magnified four times).

Using a stronger VAE may be able to alleviate the problem of detail loss

can you share the code for SDXL VAE, SD3 VAE and Flux VAE ?

Aug 28 '25 20:08 akshayparate123