H800 平台部署测试完成,数据可以参考
OS 版本
- NAME: Ubuntu
- VERSION_ID: 22.04
- VERSION: 22.04.1 LTS (Jammy Jellyfish)
CUDA 版本
+-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.144.03 Driver Version: 550.144.03 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | 0 NVIDIA H800 78504MiB / 81559MiB
容器状态
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS 6c7aaccbeabd heygem-tts -- -- / -- -- -- -- -- d69841a3b8c5 heygem-asr -- -- / -- -- -- -- -- fe0be53d3472 heygem-gen-video -- -- / -- -- -- -- --
H800 80G GPU 利用率
- GPU 利用率: 50%
- 显存使用: 10G
测试速度
音频合成
- 耗时: 2.5分钟
- 基本时间: 1:1
- 详细耗时: 145.52秒
文本合成
- 耗时: 5分钟
- 基本时间: 1:1
- 详细耗时: 329秒
3080TI 12G GPU 利用率
- GPU 利用率: 100%
- 显存使用: 10G
测试速度
音频合成
- 耗时: 3分钟
- 基本时间: 1:5
- 详细耗时: 15分钟
参考 【Heygem数字人 第1集 H800 平台 文本合成测试】 https://www.bilibili.com/video/BV1P3oFYREjx/?share_source=copy_web&vd_source=546a4d2bcc348d0c45d7e69d67755982
请问大佬,linux版本出现以下问题如何解决呢? 问题:heygem-gen-video | RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx
heygem-gen-video | 2025-04-16 10:51:13,256 - cv2box - INFO - Use default multi mode: multi-thread, or you can set env 'CV_MULTI_MODE' to multi-process/torch-process
heygem-gen-video | [2025-04-16 10:51:13] [cv_logging.py[line:27]] [INFO] [Use default multi mode: multi-thread, or you can set env 'CV_MULTI_MODE' to multi-process/torch-process]
heygem-gen-video | [2025-04-16 10:51:14] [app_local.py[line:230]] [WARNING] [ -> 服务不进行注册]
heygem-gen-video | [2025-04-16 10:51:14] [app_local.py[line:231]] [INFO] [TransDhTask init]
请问大佬,linux版本出现以下问题如何解决呢? 问题:heygem-gen-video | RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx
heygem-gen-video | 2025-04-16 10:51:13,256 - cv2box - INFO - Use default multi mode: multi-thread, or you can set env 'CV_MULTI_MODE' to multi-process/torch-process
heygem-gen-video | [2025-04-16 10:51:13] [cv_logging.py[line:27]] [INFO] [Use default multi mode: multi-thread, or you can set env 'CV_MULTI_MODE' to multi-process/torch-process]
heygem-gen-video | [2025-04-16 10:51:14] [app_local.py[line:230]] [WARNING] [ -> 服务不进行注册]
heygem-gen-video | [2025-04-16 10:51:14] [app_local.py[line:231]] [INFO] [TransDhTask init]
进入容器执行 nvidia-smi有信息么
请问大佬,linux版本出现以下问题如何解决呢? 问题:heygem-gen-video | RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx heygem-gen-video | 2025-04-16 10:51:13,256 - cv2box - INFO - Use default multi mode: multi-thread, or you can set env 'CV_MULTI_MODE' to multi-process/torch-process heygem-gen-video | [2025-04-16 10:51:13] [cv_logging.py[line:27]] [INFO] [Use default multi mode: multi-thread, or you can set env 'CV_MULTI_MODE' to multi-process/torch-process] heygem-gen-video | [2025-04-16 10:51:14] [app_local.py[line:230]] [WARNING] [ -> 服务不进行注册] heygem-gen-video | [2025-04-16 10:51:14] [app_local.py[line:231]] [INFO] [TransDhTask init]
进入容器执行 nvidia-smi有信息么
可以的:
(hg) ubuntu@VM-0-15-ubuntu:~/pingan/HeyGem.ai/deploy$ sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
Wed Apr 16 04:12:22 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.144.03 Driver Version: 550.144.03 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100-SXM4-40GB On | 00000000:00:08.0 Off | 0 |
| N/A 30C P0 50W / 400W | 8609MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| +-----------------------------------------------------------------------------------------+
docker exec -it bash
请问大佬,linux版本出现以下问题如何解决呢? 问题:heygem-gen-video | RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx heygem-gen-video | 2025-04-16 10:51:13,256 - cv2box - INFO - Use default multi mode: multi-thread, or you can set env 'CV_MULTI_MODE' to multi-process/torch-process heygem-gen-video | [2025-04-16 10:51:13] [cv_logging.py[line:27]] [INFO] [Use default multi mode: multi-thread, or you can set env 'CV_MULTI_MODE' to multi-process/torch-process] heygem-gen-video | [2025-04-16 10:51:14] [app_local.py[line:230]] [WARNING] [ -> 服务不进行注册] heygem-gen-video | [2025-04-16 10:51:14] [app_local.py[line:231]] [INFO] [TransDhTask init]
进入容器执行 nvidia-smi有信息么
可以的: (hg) ubuntu@VM-0-15-ubuntu:~/pingan/HeyGem.ai/deploy$ sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi Wed Apr 16 04:12:22 2025 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.144.03 Driver Version: 550.144.03 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA A100-SXM4-40GB On | 00000000:00:08.0 Off | 0 | | N/A 30C P0 50W / 400W | 8609MiB / 40960MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| +-----------------------------------------------------------------------------------------+
****执行这个 docker exec -i heygem-gen-video nvidia-smi
按这个来,大概率是cuda 和驱动太新了。 https://www.bilibili.com/video/BV1T4dBYNEA3/
不要超过这个 Driver Version: 550.144.03 CUDA Version: 12.4
docker exec -it bash
请问大佬,linux版本出现以下问题如何解决呢? 问题:heygem-gen-video | RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx heygem-gen-video | 2025-04-16 10:51:13,256 - cv2box - INFO - Use default multi mode: multi-thread, or you can set env 'CV_MULTI_MODE' to multi-process/torch-process heygem-gen-video | [2025-04-16 10:51:13] [cv_logging.py[line:27]] [INFO] [Use default multi mode: multi-thread, or you can set env 'CV_MULTI_MODE' to multi-process/torch-process] heygem-gen-video | [2025-04-16 10:51:14] [app_local.py[line:230]] [WARNING] [ -> 服务不进行注册] heygem-gen-video | [2025-04-16 10:51:14] [app_local.py[line:231]] [INFO] [TransDhTask init]
进入容器执行 nvidia-smi有信息么
可以的: (hg) ubuntu@VM-0-15-ubuntu:~/pingan/HeyGem.ai/deploy$ sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi Wed Apr 16 04:12:22 2025 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.144.03 Driver Version: 550.144.03 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA A100-SXM4-40GB On | 00000000:00:08.0 Off | 0 | | N/A 30C P0 50W / 400W | 8609MiB / 40960MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| +-----------------------------------------------------------------------------------------+
****执行这个 docker exec -i heygem-gen-video nvidia-smi 出现以下问题: (hg) ubuntu@VM-0-15-ubuntu:~/pingan/HeyGem.ai/deploy$ sudo docker exec -i heygem-gen-video nvidia-smi Error response from daemon: Container 68774b746df917e8fe1bc4f17ad5793cbab737cb2b23cccbb78f13cb29e76b85 is restarting, wait until the container is running (hg) ubuntu@VM-0-15-ubuntu:~/pingan/HeyGem.ai/deploy$ sudo docker exec -i heygem-gen-video nvidia-smi exec /usr/bin/nvidia-smi: exec format error
按这个来,大概率是cuda 和驱动太新了。 https://www.bilibili.com/video/BV1T4dBYNEA3/
不要超过这个 Driver Version: 550.144.03 CUDA Version: 12.4
感谢,我试试
docker exec -it bash
请问大佬,linux版本出现以下问题如何解决呢? 问题:heygem-gen-video | RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx heygem-gen-video | 2025-04-16 10:51:13,256 - cv2box - INFO - Use default multi mode: multi-thread, or you can set env 'CV_MULTI_MODE' to multi-process/torch-process heygem-gen-video | [2025-04-16 10:51:13] [cv_logging.py[line:27]] [INFO] [Use default multi mode: multi-thread, or you can set env 'CV_MULTI_MODE' to multi-process/torch-process] heygem-gen-video | [2025-04-16 10:51:14] [app_local.py[line:230]] [WARNING] [ -> 服务不进行注册] heygem-gen-video | [2025-04-16 10:51:14] [app_local.py[line:231]] [INFO] [TransDhTask init]
进入容器执行 nvidia-smi有信息么
可以的: (hg) ubuntu@VM-0-15-ubuntu:~/pingan/HeyGem.ai/deploy$ sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi Wed Apr 16 04:12:22 2025 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.144.03 Driver Version: 550.144.03 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA A100-SXM4-40GB On | 00000000:00:08.0 Off | 0 | | N/A 30C P0 50W / 400W | 8609MiB / 40960MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| +-----------------------------------------------------------------------------------------+
****执行这个 docker exec -i heygem-gen-video nvidia-smi 出现以下问题: (hg) ubuntu@VM-0-15-ubuntu:~/pingan/HeyGem.ai/deploy$ sudo docker exec -i heygem-gen-video nvidia-smi Error response from daemon: Container 68774b746df917e8fe1bc4f17ad5793cbab737cb2b23cccbb78f13cb29e76b85 is restarting, wait until the container is running (hg) ubuntu@VM-0-15-ubuntu:~/pingan/HeyGem.ai/deploy$ sudo docker exec -i heygem-gen-video nvidia-smi exec /usr/bin/nvidia-smi: exec format error
你这个不对,容器不完整,对比下 guiji2025/fish-speech-ziming latest 552bcd0807da 5 weeks ago 31.3GB guiji2025/fun-asr latest d0b0ac2466d1 5 weeks ago 26.9GB guiji2025/heygem.ai latest 987755f90312 6 months ago 13.5GB
正常输出应该是这样 docker exec -i heygem-gen-video nvidia-smi |head +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.144.03 Driver Version: 550.144.03 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+
docker exec -it bash
请问大佬,linux版本出现以下问题如何解决呢? 问题:heygem-gen-video | RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx heygem-gen-video | 2025-04-16 10:51:13,256 - cv2box - INFO - Use default multi mode: multi-thread, or you can set env 'CV_MULTI_MODE' to multi-process/torch-process heygem-gen-video | [2025-04-16 10:51:13] [cv_logging.py[line:27]] [INFO] [Use default multi mode: multi-thread, or you can set env 'CV_MULTI_MODE' to multi-process/torch-process] heygem-gen-video | [2025-04-16 10:51:14] [app_local.py[line:230]] [WARNING] [ -> 服务不进行注册] heygem-gen-video | [2025-04-16 10:51:14] [app_local.py[line:231]] [INFO] [TransDhTask init]
进入容器执行 nvidia-smi有信息么
可以的: (hg) ubuntu@VM-0-15-ubuntu:~/pingan/HeyGem.ai/deploy$ sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi Wed Apr 16 04:12:22 2025 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.144.03 Driver Version: 550.144.03 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA A100-SXM4-40GB On | 00000000:00:08.0 Off | 0 | | N/A 30C P0 50W / 400W | 8609MiB / 40960MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| +-----------------------------------------------------------------------------------------+
****执行这个 docker exec -i heygem-gen-video nvidia-smi 出现以下问题: (hg) ubuntu@VM-0-15-ubuntu:~/pingan/HeyGem.ai/deploy$ sudo docker exec -i heygem-gen-video nvidia-smi Error response from daemon: Container 68774b746df917e8fe1bc4f17ad5793cbab737cb2b23cccbb78f13cb29e76b85 is restarting, wait until the container is running (hg) ubuntu@VM-0-15-ubuntu:~/pingan/HeyGem.ai/deploy$ sudo docker exec -i heygem-gen-video nvidia-smi exec /usr/bin/nvidia-smi: exec format error
你这个不对,容器不完整,对比下 guiji2025/fish-speech-ziming latest 552bcd0807da 5 weeks ago 31.3GB guiji2025/fun-asr latest d0b0ac2466d1 5 weeks ago 26.9GB guiji2025/heygem.ai latest 987755f90312 6 months ago 13.5GB
正常输出应该是这样 docker exec -i heygem-gen-video nvidia-smi |head +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.144.03 Driver Version: 550.144.03 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ 似乎大小是符合的: (hg) ubuntu@VM-0-15-ubuntu:~/pingan/HeyGem.ai/deploy$ sudo docker images REPOSITORY TAG IMAGE ID CREATED SIZE guiji2025/fish-speech-ziming latest 552bcd0807da 5 weeks ago 31.3GB guiji2025/fun-asr latest d0b0ac2466d1 5 weeks ago 26.9GB guiji2025/heygem.ai latest 987755f90312 6 months ago 13.5GB