实时推理4090d爆显存问题
(venv) nayota@dell-Precision-3660:~/source/MuseTalk$ sh inference.sh v1.5 realtime
please download ffmpeg-static and export to FFMPEG_PATH.
For example: export FFMPEG_PATH=/musetalk/ffmpeg-4.4-amd64-static
Loads checkpoint by local backend from path: ./models/dwpose/dw-ll_ucoco_384.pth
cuda start
/home/nayota/source/MuseTalk/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py:125: UserWarning: Decorating classes is deprecated and will be disabled in future versions. You should only decorate functions or methods. To preserve the current behavior of class decoration, you can directly decorate the __init__ method and nothing else.
warnings.warn("Decorating classes is deprecated and will be disabled in "
load unet model from ./models/musetalkV15/unet.pth
{'avator_1': {'preparation': True, 'bbox_shift': 5, 'video_path': 'data/video/yongen.mp4', 'audio_clips': {'audio_0': 'data/audio/yongen.wav'}}}
avator_1 exists, Do you want to re-create it ? (y/n)y
creating avator: avator_1
preparing data materials ... ... extracting landmarks... reading images... 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 259/259 [00:02<00:00, 113.76it/s] get key_landmark and face bounding boxes with the default value 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 259/259 [00:08<00:00, 29.02it/s] bbox_shift parameter adjustment************** Total frame:「259」 Manually adjust range : [ -21~23 ] , the current value: 0
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 518/518 [00:17<00:00, 29.40it/s]
Inferring using: data/audio/yongen.wav
start inference
2025-04-11 19:44:35.014889: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
2025-04-11 19:44:35.032564: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-04-11 19:44:35.349526: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
processing audio:data/audio/yongen.wav costs 1122.328281402588ms
200
0%| | 0/8 [00:02<?, ?it/s]
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/nayota/source/MuseTalk/scripts/realtime_inference.py", line 387, in
@codestart-zhu 您可以把batch_size调小一些
@zzzweakman 您好,请问是要改哪个文件呢,麻烦指导一下谢谢
@zzzweakman 您好,已经调整了batch_size大小,改成15现在占用在20g左右,想问一下实时推理的好像只有图片是实时生成的,音频是最后生成出来的
@zzzweakman 您好,已经调整了batch_size大小,改成15现在占用在20g左右,想问一下实时推理的好像只有图片是实时生成的,音频是最后生成出来的
是的,因为代码里有合成视频这一步,会将声音和图像序列合成视频
@zzzweakman 您好,已经调整了batch_size大小,改成15现在占用在20g左右,想问一下实时推理的好像只有图片是实时生成的,音频是最后生成出来的
4090D的运算能力,你哪怕改成4或者2都能满足你的试试推理要求。显存占用我测试下来,能做到11G。也就是3080,4080,5080都能跑,但是瓶颈不在显存,在GPU算力,因为GPU占用率爆了,多开满足不了实时推理性能