Zhizhou Zhong comments

Results 30 comments of


                                            Zhizhou Zhong

The saved training model is pth, while resume_from_checkpoint: True ， How to solve the problem of loading bin?

Hi @sunbo11112, this[ line](https://github.com/TMElyralab/MuseTalk/blob/main/train.py#L562) of code will save the checkpoint in `.bin` format. If you set `resume_from_checkpoint` to`True`, the code will automatically locate the latest checkpoint and resume training from...

有个bug

@lw3259111 感谢关注，您可以先用剪映之类的软件或是ffmpeg将视频预处理为25fps

initial expression alignment

> It seems to me that that the expression of the first frame of the driving video must match the picture to animate, otherwise the lips of the resultant video...

为什么训练模型时在val生成的图像很好，但是在推理时生成的视频会产生明显的伪影？（嘴巴糊成一团）

@Wangwenjing520 您训练了多少steps呢，有测试输出的视频case吗？

实时推理4090d爆显存问题

@codestart-zhu 您可以把[batch_size](https://github.com/TMElyralab/MuseTalk/blob/main/scripts/realtime_inference.py#L321)调小一些

实时推理4090d爆显存问题

> [@zzzweakman](https://github.com/zzzweakman) 您好，已经调整了batch_size大小，改成15现在占用在20g左右，想问一下实时推理的好像只有图片是实时生成的，音频是最后生成出来的 > > ![Image](https://github.com/user-attachments/assets/082f4c37-5d9b-4876-bdcc-5c7059300ee6) 是的，因为代码里有合成视频这一步，会将声音和图像序列合成视频

感觉引用vae始终是个问题，不能百分百复原脸部细节

> 04034_0403_expfinetune.mp4 感谢关注！请问这个结果使用哪个模型跑的？

感觉引用vae始终是个问题，不能百分百复原脸部细节

> 中间是原素材，最右边是musetalk的效果，能明显看到差距我们针对这个问题做过实验，第一行的最左列是原图，其余四列分别是SD1.5的VAE（4通道）、SDXL的VAE（4通道）、SD3的VAE（16通道）、Flux的VAE（16通道）对原图进行重建；第二行是原图与重建图像的残差（放大四倍。用更强的VAE也许能够缓解细节损失的问题 ![Image](https://github.com/user-attachments/assets/136956dd-d3e3-4535-b869-401a6d210bdb)

OutOfMemoryError

Hi @AugustLigh, can you check if your GPU has full available memory while running the program? I mean, make sure no other programs are using the GPU memory so that...

视频fps问题

@liangxs123456 最好处理为25fps