Duix.Heygem icon indicating copy to clipboard operation
Duix.Heygem copied to clipboard

H800 平台部署测试完成,数据可以参考

Open ops120 opened this issue 8 months ago • 10 comments

OS 版本

  • NAME: Ubuntu
  • VERSION_ID: 22.04
  • VERSION: 22.04.1 LTS (Jammy Jellyfish)

CUDA 版本

+-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.144.03 Driver Version: 550.144.03 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | 0 NVIDIA H800 78504MiB / 81559MiB

容器状态

CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS 6c7aaccbeabd heygem-tts -- -- / -- -- -- -- -- d69841a3b8c5 heygem-asr -- -- / -- -- -- -- -- fe0be53d3472 heygem-gen-video -- -- / -- -- -- -- --

H800 80G GPU 利用率

  • GPU 利用率: 50%
  • 显存使用: 10G

测试速度

音频合成

  • 耗时: 2.5分钟
  • 基本时间: 1:1
  • 详细耗时: 145.52秒

文本合成

  • 耗时: 5分钟
  • 基本时间: 1:1
  • 详细耗时: 329秒

3080TI 12G GPU 利用率

  • GPU 利用率: 100%
  • 显存使用: 10G

测试速度

音频合成

  • 耗时: 3分钟
  • 基本时间: 1:5
  • 详细耗时: 15分钟

ops120 avatar Apr 15 '25 09:04 ops120

参考 【Heygem数字人 第1集 H800 平台 文本合成测试】 https://www.bilibili.com/video/BV1P3oFYREjx/?share_source=copy_web&vd_source=546a4d2bcc348d0c45d7e69d67755982

ops120 avatar Apr 15 '25 11:04 ops120

请问大佬,linux版本出现以下问题如何解决呢? 问题:heygem-gen-video | RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

heygem-gen-video | 2025-04-16 10:51:13,256 - cv2box - INFO - Use default multi mode: multi-thread, or you can set env 'CV_MULTI_MODE' to multi-process/torch-process

heygem-gen-video | [2025-04-16 10:51:13] [cv_logging.py[line:27]] [INFO] [Use default multi mode: multi-thread, or you can set env 'CV_MULTI_MODE' to multi-process/torch-process]

heygem-gen-video | [2025-04-16 10:51:14] [app_local.py[line:230]] [WARNING] [ -> 服务不进行注册]

heygem-gen-video | [2025-04-16 10:51:14] [app_local.py[line:231]] [INFO] [TransDhTask init]

ssp-seven avatar Apr 16 '25 02:04 ssp-seven

请问大佬,linux版本出现以下问题如何解决呢? 问题:heygem-gen-video | RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

heygem-gen-video | 2025-04-16 10:51:13,256 - cv2box - INFO - Use default multi mode: multi-thread, or you can set env 'CV_MULTI_MODE' to multi-process/torch-process

heygem-gen-video | [2025-04-16 10:51:13] [cv_logging.py[line:27]] [INFO] [Use default multi mode: multi-thread, or you can set env 'CV_MULTI_MODE' to multi-process/torch-process]

heygem-gen-video | [2025-04-16 10:51:14] [app_local.py[line:230]] [WARNING] [ -> 服务不进行注册]

heygem-gen-video | [2025-04-16 10:51:14] [app_local.py[line:231]] [INFO] [TransDhTask init]

进入容器执行 nvidia-smi有信息么

ops120 avatar Apr 16 '25 03:04 ops120

请问大佬,linux版本出现以下问题如何解决呢? 问题:heygem-gen-video | RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx heygem-gen-video | 2025-04-16 10:51:13,256 - cv2box - INFO - Use default multi mode: multi-thread, or you can set env 'CV_MULTI_MODE' to multi-process/torch-process heygem-gen-video | [2025-04-16 10:51:13] [cv_logging.py[line:27]] [INFO] [Use default multi mode: multi-thread, or you can set env 'CV_MULTI_MODE' to multi-process/torch-process] heygem-gen-video | [2025-04-16 10:51:14] [app_local.py[line:230]] [WARNING] [ -> 服务不进行注册] heygem-gen-video | [2025-04-16 10:51:14] [app_local.py[line:231]] [INFO] [TransDhTask init]

进入容器执行 nvidia-smi有信息么

可以的: (hg) ubuntu@VM-0-15-ubuntu:~/pingan/HeyGem.ai/deploy$ sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi Wed Apr 16 04:12:22 2025
+-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.144.03 Driver Version: 550.144.03 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA A100-SXM4-40GB On | 00000000:00:08.0 Off | 0 | | N/A 30C P0 50W / 400W | 8609MiB / 40960MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| +-----------------------------------------------------------------------------------------+

ssp-seven avatar Apr 16 '25 04:04 ssp-seven

docker exec -it bash

请问大佬,linux版本出现以下问题如何解决呢? 问题:heygem-gen-video | RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx heygem-gen-video | 2025-04-16 10:51:13,256 - cv2box - INFO - Use default multi mode: multi-thread, or you can set env 'CV_MULTI_MODE' to multi-process/torch-process heygem-gen-video | [2025-04-16 10:51:13] [cv_logging.py[line:27]] [INFO] [Use default multi mode: multi-thread, or you can set env 'CV_MULTI_MODE' to multi-process/torch-process] heygem-gen-video | [2025-04-16 10:51:14] [app_local.py[line:230]] [WARNING] [ -> 服务不进行注册] heygem-gen-video | [2025-04-16 10:51:14] [app_local.py[line:231]] [INFO] [TransDhTask init]

进入容器执行 nvidia-smi有信息么

可以的: (hg) ubuntu@VM-0-15-ubuntu:~/pingan/HeyGem.ai/deploy$ sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi Wed Apr 16 04:12:22 2025 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.144.03 Driver Version: 550.144.03 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA A100-SXM4-40GB On | 00000000:00:08.0 Off | 0 | | N/A 30C P0 50W / 400W | 8609MiB / 40960MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| +-----------------------------------------------------------------------------------------+

****执行这个 docker exec -i heygem-gen-video nvidia-smi

ops120 avatar Apr 16 '25 05:04 ops120

按这个来,大概率是cuda 和驱动太新了。 https://www.bilibili.com/video/BV1T4dBYNEA3/

不要超过这个 Driver Version: 550.144.03 CUDA Version: 12.4

ops120 avatar Apr 16 '25 05:04 ops120

docker exec -it bash

请问大佬,linux版本出现以下问题如何解决呢? 问题:heygem-gen-video | RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx heygem-gen-video | 2025-04-16 10:51:13,256 - cv2box - INFO - Use default multi mode: multi-thread, or you can set env 'CV_MULTI_MODE' to multi-process/torch-process heygem-gen-video | [2025-04-16 10:51:13] [cv_logging.py[line:27]] [INFO] [Use default multi mode: multi-thread, or you can set env 'CV_MULTI_MODE' to multi-process/torch-process] heygem-gen-video | [2025-04-16 10:51:14] [app_local.py[line:230]] [WARNING] [ -> 服务不进行注册] heygem-gen-video | [2025-04-16 10:51:14] [app_local.py[line:231]] [INFO] [TransDhTask init]

进入容器执行 nvidia-smi有信息么

可以的: (hg) ubuntu@VM-0-15-ubuntu:~/pingan/HeyGem.ai/deploy$ sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi Wed Apr 16 04:12:22 2025 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.144.03 Driver Version: 550.144.03 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA A100-SXM4-40GB On | 00000000:00:08.0 Off | 0 | | N/A 30C P0 50W / 400W | 8609MiB / 40960MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| +-----------------------------------------------------------------------------------------+

****执行这个 docker exec -i heygem-gen-video nvidia-smi 出现以下问题: (hg) ubuntu@VM-0-15-ubuntu:~/pingan/HeyGem.ai/deploy$ sudo docker exec -i heygem-gen-video nvidia-smi Error response from daemon: Container 68774b746df917e8fe1bc4f17ad5793cbab737cb2b23cccbb78f13cb29e76b85 is restarting, wait until the container is running (hg) ubuntu@VM-0-15-ubuntu:~/pingan/HeyGem.ai/deploy$ sudo docker exec -i heygem-gen-video nvidia-smi exec /usr/bin/nvidia-smi: exec format error

ssp-seven avatar Apr 16 '25 05:04 ssp-seven

按这个来,大概率是cuda 和驱动太新了。 https://www.bilibili.com/video/BV1T4dBYNEA3/

不要超过这个 Driver Version: 550.144.03 CUDA Version: 12.4

感谢,我试试

ssp-seven avatar Apr 16 '25 05:04 ssp-seven

docker exec -it bash

请问大佬,linux版本出现以下问题如何解决呢? 问题:heygem-gen-video | RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx heygem-gen-video | 2025-04-16 10:51:13,256 - cv2box - INFO - Use default multi mode: multi-thread, or you can set env 'CV_MULTI_MODE' to multi-process/torch-process heygem-gen-video | [2025-04-16 10:51:13] [cv_logging.py[line:27]] [INFO] [Use default multi mode: multi-thread, or you can set env 'CV_MULTI_MODE' to multi-process/torch-process] heygem-gen-video | [2025-04-16 10:51:14] [app_local.py[line:230]] [WARNING] [ -> 服务不进行注册] heygem-gen-video | [2025-04-16 10:51:14] [app_local.py[line:231]] [INFO] [TransDhTask init]

进入容器执行 nvidia-smi有信息么

可以的: (hg) ubuntu@VM-0-15-ubuntu:~/pingan/HeyGem.ai/deploy$ sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi Wed Apr 16 04:12:22 2025 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.144.03 Driver Version: 550.144.03 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA A100-SXM4-40GB On | 00000000:00:08.0 Off | 0 | | N/A 30C P0 50W / 400W | 8609MiB / 40960MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| +-----------------------------------------------------------------------------------------+

****执行这个 docker exec -i heygem-gen-video nvidia-smi 出现以下问题: (hg) ubuntu@VM-0-15-ubuntu:~/pingan/HeyGem.ai/deploy$ sudo docker exec -i heygem-gen-video nvidia-smi Error response from daemon: Container 68774b746df917e8fe1bc4f17ad5793cbab737cb2b23cccbb78f13cb29e76b85 is restarting, wait until the container is running (hg) ubuntu@VM-0-15-ubuntu:~/pingan/HeyGem.ai/deploy$ sudo docker exec -i heygem-gen-video nvidia-smi exec /usr/bin/nvidia-smi: exec format error

你这个不对,容器不完整,对比下 guiji2025/fish-speech-ziming latest 552bcd0807da 5 weeks ago 31.3GB guiji2025/fun-asr latest d0b0ac2466d1 5 weeks ago 26.9GB guiji2025/heygem.ai latest 987755f90312 6 months ago 13.5GB

正常输出应该是这样 docker exec -i heygem-gen-video nvidia-smi |head +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.144.03 Driver Version: 550.144.03 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+

ops120 avatar Apr 16 '25 05:04 ops120

docker exec -it bash

请问大佬,linux版本出现以下问题如何解决呢? 问题:heygem-gen-video | RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx heygem-gen-video | 2025-04-16 10:51:13,256 - cv2box - INFO - Use default multi mode: multi-thread, or you can set env 'CV_MULTI_MODE' to multi-process/torch-process heygem-gen-video | [2025-04-16 10:51:13] [cv_logging.py[line:27]] [INFO] [Use default multi mode: multi-thread, or you can set env 'CV_MULTI_MODE' to multi-process/torch-process] heygem-gen-video | [2025-04-16 10:51:14] [app_local.py[line:230]] [WARNING] [ -> 服务不进行注册] heygem-gen-video | [2025-04-16 10:51:14] [app_local.py[line:231]] [INFO] [TransDhTask init]

进入容器执行 nvidia-smi有信息么

可以的: (hg) ubuntu@VM-0-15-ubuntu:~/pingan/HeyGem.ai/deploy$ sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi Wed Apr 16 04:12:22 2025 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.144.03 Driver Version: 550.144.03 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA A100-SXM4-40GB On | 00000000:00:08.0 Off | 0 | | N/A 30C P0 50W / 400W | 8609MiB / 40960MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| +-----------------------------------------------------------------------------------------+

****执行这个 docker exec -i heygem-gen-video nvidia-smi 出现以下问题: (hg) ubuntu@VM-0-15-ubuntu:~/pingan/HeyGem.ai/deploy$ sudo docker exec -i heygem-gen-video nvidia-smi Error response from daemon: Container 68774b746df917e8fe1bc4f17ad5793cbab737cb2b23cccbb78f13cb29e76b85 is restarting, wait until the container is running (hg) ubuntu@VM-0-15-ubuntu:~/pingan/HeyGem.ai/deploy$ sudo docker exec -i heygem-gen-video nvidia-smi exec /usr/bin/nvidia-smi: exec format error

你这个不对,容器不完整,对比下 guiji2025/fish-speech-ziming latest 552bcd0807da 5 weeks ago 31.3GB guiji2025/fun-asr latest d0b0ac2466d1 5 weeks ago 26.9GB guiji2025/heygem.ai latest 987755f90312 6 months ago 13.5GB

正常输出应该是这样 docker exec -i heygem-gen-video nvidia-smi |head +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.144.03 Driver Version: 550.144.03 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ 似乎大小是符合的: (hg) ubuntu@VM-0-15-ubuntu:~/pingan/HeyGem.ai/deploy$ sudo docker images REPOSITORY TAG IMAGE ID CREATED SIZE guiji2025/fish-speech-ziming latest 552bcd0807da 5 weeks ago 31.3GB guiji2025/fun-asr latest d0b0ac2466d1 5 weeks ago 26.9GB guiji2025/heygem.ai latest 987755f90312 6 months ago 13.5GB

ssp-seven avatar Apr 16 '25 06:04 ssp-seven