The data processing is very slow, so we cannot update a faster script
The current open-source code is a single-process version. You might consider modifying it to support multiple processes. The logic in this part involves generating a corresponding JSON Meta file for each video. Feel free to submit a pull request after making the changes.
You can find the relevant code here: https://github.com/TMElyralab/MuseTalk/blob/main/scripts/preprocess.py#L309
Can we use mediapipe instead of FaceAlignment, or do you suggest any faster models for modification
image_pred.shape: torch.Size([32, 3, 256, 256]) concat.shape: torch.Size([8, 3, 1280, 256]) Image saved successfully: ./exp_out/stage1//test/samples/sample_10_cuda_SyncNetScore_1.jpg Steps: 0%| | 14/250000 [02:39<658:58:44, 9.49s/it, lr=2.8e-7, step_loss=0.421, td=0.04s, tm=2.51s]video file error:./dataset/HDTF/video_audio_clip_root/clip000_107.mp4
Steps: 0%| | 15/250000 [02:41<517:28:25, 7.45s/it, lr=3e-7, step_loss=0.394, td=0.04s, tm=2.54s]video file error:./dataset/HDTF/video_audio_clip_root/clip000_1151.mp4
Steps: 0%| | 30/250000 [03:25<213:05:17, 3.07s/it, lr=6e-7, step_loss=0.338, td=0.11s, tm=2.58s]video file error:./dataset/HDTF/video_audio_clip_root/clip000_1205.mp4
I confirm the error video and there are no issues. Why did it occur
Unable to load model training from training node, prompting inconsistent weights