MuseTalk icon indicating copy to clipboard operation
MuseTalk copied to clipboard

模型训练中出现的问题

Open lmc710731910 opened this issue 7 months ago • 2 comments

Steps: 8%|▊ | 20787/250000 [15:19:25<173:49:13, 2.73s/it, lr=4.99e-6, step_loss=0.968, td=0.06s, tm=3.03s]video ./dataset/HDTF/video_audio_clip_root/clip005_50_xin.mp4 has less than 160 frames video ./dataset/HDTF/video_audio_clip_root/clip003_RD_Radio21_000.mp4 has face size 195 less than minimum required 200 video ./dataset/HDTF/video_audio_clip_root/clip001_209.mp4 has less than 160 frames video ./dataset/HDTF/video_audio_clip_root/clip001_130.mp4 has less than 160 frames Steps: 8%|▊ | 20788/250000 [15:19:27<166:23:55, 2.61s/it, lr=4.99e-6, step_loss=1.12, td=0.04s, tm=2.27s] video file error:./dataset/HDTF/video_audio_clip_root/clip014_WRA_RogerWicker1_001.mp4

audio file error:./dataset/HDTF/video_audio_clip_root/clip001_WRA_JuddGregg_002.wav cannot unpack non-iterable NoneType object audio file error:./dataset/HDTF/video_audio_clip_root/clip004_WDA_FrankPallone1_000.wav cannot unpack non-iterable NoneType object Steps: 8%|▊ | 20789/250000 [15:19:29<160:37:09, 2.52s/it, lr=4.99e-6, step_loss=1.19, td=0.05s, tm=2.24s]video ./dataset/HDTF/video_audio_clip_root/clip004_RD_Radio21_000.mp4 has face size 195 less than minimum required 200 audio file error:./dataset/HDTF/video_audio_clip_root/clip013_WDA_LloydDoggett0_000.wav cannot unpack non-iterable NoneType object Steps: 8%|▊ | 20790/250000 [15:19:33<177:27:51, 2.79s/it, lr=4.99e-6, step_loss=0.806, td=0.05s, tm=3.33s]video file error:./dataset/HDTF/video_audio_clip_root/clip008_17_xin.mp4

audio file error:./dataset/HDTF/video_audio_clip_root/clip010_WDA_JonTester1_000.wav cannot unpack non-iterable NoneType object audio file error:./dataset/HDTF/video_audio_clip_root/clip000_WRA_DebFischer1_000.wav cannot unpack non-iterable NoneType object

请问这是数据没有处理好么?还是自动过滤有问题的数据不用处理呢?训练过程中大范围出现这种问题,几乎一直在报这样的问题,请问这是什么情况?

lmc710731910 avatar May 08 '25 02:05 lmc710731910

你好@lmc710731910, 从这个log看起来,(1)有一些mp4过短了(2)有一些音频文件读取失败了(3)有些视频中的人脸分辨率过低。您可以pick一些失败的case进行查看,在另一个代码中测试这些文件是否能被正确读入

zzzweakman avatar May 10 '25 05:05 zzzweakman

warning “video ./dataset/HDTF/video_audio_clip_root/clip003_RD_Radio21_000.mp4 has face size 195 less than minimum required 200"并不影响训练,只是跳过了一些低分辨率的训练数据。 https://github.com/TMElyralab/MuseTalk/blob/main/musetalk/data/dataset.py#L351

我们在之前的一些实验中发现,低清的数据对最终结果有负面影响。 一个简单的过滤规则就是基于原视频中,人脸区域的像素值宽度来过滤。 对应的参数是这个: https://github.com/TMElyralab/MuseTalk/blob/main/musetalk/data/dataset.py#L559

aidenyzhang avatar May 26 '25 09:05 aidenyzhang