MuseTalk icon indicating copy to clipboard operation
MuseTalk copied to clipboard

1.5的推理越界问题

Open QUTLiJingxiao opened this issue 8 months ago • 3 comments

(musetalk) root@8a95d418acc6:/home/MuseTalk1.5# sh inference.sh v1.5 normal please download ffmpeg-static and export to FFMPEG_PATH. For example: export FFMPEG_PATH=/musetalk/ffmpeg-4.4-amd64-static Loads checkpoint by local backend from path: ./models/dwpose/dw-ll_ucoco_384.pth cuda start Adding ffmpeg to PATH load unet model from ./models/musetalkV15/unet.pth /root/anaconda3/envs/musetalk/lib/python3.10/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() return self.fget.get(instance, owner)() Loaded inference config: {'task_0': {'video_path': 'data/video/test2.mp4', 'audio_path': 'data/audio/longxiaochun.wav'}} 2025-04-07 21:57:45.915815: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. 2025-04-07 21:57:45.963280: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2025-04-07 21:57:46.760516: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT Error occurred: whisper_feature.shape: torch.Size([1, 5254, 5, 384]) audio_clip.shape: torch.Size([1, 9, 5, 384]) num frames: 3148, fps: 30, whisper_idx_multiplier: 1.6666666666666667 frame_index: 3147, audio_index: 5245-5255

我在使用1.5进行推理的时候遇到这样的问题,请问有什么办法解决吗,谢谢

QUTLiJingxiao avatar Apr 08 '25 02:04 QUTLiJingxiao

这里应该用向上取整,我们修复一下。 https://github.com/TMElyralab/MuseTalk/blob/main/musetalk/utils/audio_processor.py#L67C1-L68C1

aidenyzhang avatar Apr 08 '25 07:04 aidenyzhang

建议用25FPS的视频来推理。 25FPS的时候whisper_idx_multiplier=2,这里就不会有取整导致的问题,对齐逻辑也会跟训练的时候更一致。 训练数据是将视频预处理的25FPS的。

aidenyzhang avatar Apr 08 '25 07:04 aidenyzhang

建议用25FPS的视频来推理。 25FPS的时候whisper_idx_multiplier=2,这里就不会有取整导致的问题,对齐逻辑也会跟训练的时候更一致。 训练数据是将视频预处理的25FPS的。

好的谢谢您

QUTLiJingxiao avatar Apr 08 '25 07:04 QUTLiJingxiao