Duix.Heygem icon indicating copy to clipboard operation
Duix.Heygem copied to clipboard

【已解决,可参考】heygem-f2f反复重启“RuntimeError: Found no NVIDIA driver on your system.”

Open HMyaoyuan opened this issue 9 months ago • 19 comments

2025-03-12 17:55:36 2025-03-12 17:55:36,250 - cv2box - INFO - Use default multi mode: multi-thread, or you can set env 'CV_MULTI_MODE' to multi-process/torch-process 2025-03-12 17:55:36 [2025-03-12 17:55:36] [cv_logging.py[line:27]] [INFO] [Use default multi mode: multi-thread, or you can set env 'CV_MULTI_MODE' to multi-process/torch-process] 2025-03-12 17:55:37 [2025-03-12 17:55:37] [app_local.py[line:230]] [WARNING] [ -> 服务不进行注册] 2025-03-12 17:55:37 [2025-03-12 17:55:37] [app_local.py[line:231]] [INFO] [TransDhTask init] 2025-03-12 17:55:38 Traceback (most recent call last): 2025-03-12 17:55:38 File "/code/app_local.py", line 231, in 2025-03-12 17:55:38 TransDhTask.instance() 2025-03-12 17:55:38 File "trans_dh_service.py", line 1207, in trans_dh_service.TransDhTask.instance 2025-03-12 17:55:38 File "trans_dh_service.py", line 1189, in trans_dh_service.TransDhTask.init 2025-03-12 17:55:38 File "compute_ctc_att_bnf.py", line 130, in compute_ctc_att_bnf.load_ppg_model 2025-03-12 17:55:38 File "/usr/local/python3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1152, in to 2025-03-12 17:55:38 return self._apply(convert) 2025-03-12 17:55:38 File "/usr/local/python3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 802, in _apply 2025-03-12 17:55:38 module._apply(fn) 2025-03-12 17:55:38 File "/usr/local/python3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 802, in _apply 2025-03-12 17:55:38 module._apply(fn) 2025-03-12 17:55:38 File "/usr/local/python3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 802, in _apply 2025-03-12 17:55:38 module._apply(fn) 2025-03-12 17:55:38 [Previous line repeated 1 more time] 2025-03-12 17:55:38 File "/usr/local/python3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 825, in _apply 2025-03-12 17:55:38 param_applied = fn(param) 2025-03-12 17:55:38 File "/usr/local/python3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1150, in convert 2025-03-12 17:55:38 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) 2025-03-12 17:55:38 File "/usr/local/python3/lib/python3.8/site-packages/torch/cuda/init.py", line 302, in _lazy_init 2025-03-12 17:55:38 torch._C._cuda_init() 2025-03-12 17:55:38 RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

HMyaoyuan avatar Mar 12 '25 09:03 HMyaoyuan

C:\Users\RunChen>nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Tue_Mar__8_18:36:24_Pacific_Standard_Time_2022 Cuda compilation tools, release 11.6, V11.6.124 Build cuda_11.6.r11.6/compiler.31057947_0

C:\Users\RunChen>nvidia-smi Wed Mar 12 17:57:26 2025 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 572.70 Driver Version: 572.70 CUDA Version: 12.8 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Driver-Model | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 4090 WDDM | 00000000:01:00.0 On | Off | | 0% 44C P8 24W / 450W | 8471MiB / 24564MiB | 13% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 2132 C+G ...yb3d8bbwe\WindowsTerminal.exe N/A | | 0 N/A N/A 3564 C+G C:\Windows\explorer.exe N/A | | 0 N/A N/A 4088 C+G ...IA app\CEF\NVIDIA Overlay.exe N/A | | 0 N/A N/A 4332 C+G ...acted\runtime\WeChatAppEx.exe N/A | | 0 N/A N/A 7900 C+G ...xyewy\ShellExperienceHost.exe N/A | | 0 N/A N/A 9700 C+G ...ouryDevice\asus_framework.exe N/A | | 0 N/A N/A 11860 C+G ...em32\Kinect\KinectService.exe N/A | | 0 N/A N/A 15180 C+G ...ntrolPanel\SystemSettings.exe N/A | | 0 N/A N/A 15308 C+G C:\Windows\explorer.exe N/A | | 0 N/A N/A 16228 C+G ...lus\logioptionsplus_agent.exe N/A | | 0 N/A N/A 18284 C+G ...es (x86)\Epic Pen\EpicPen.exe N/A | | 0 N/A N/A 18424 C+G ...logioptionsplus_logivoice.exe N/A | | 0 N/A N/A 18768 C+G ..._cw5n1h2txyewy\SearchHost.exe N/A | | 0 N/A N/A 18792 C+G ...y\StartMenuExperienceHost.exe N/A | | 0 N/A N/A 19688 C+G ...8bbwe\PhoneExperienceHost.exe N/A | | 0 N/A N/A 20840 C+G ...ogram Files\ToDesk\ToDesk.exe N/A | | 0 N/A N/A 21072 C+G ...desk\Autodesk AdSSO\AdSSO.exe N/A | | 0 N/A N/A 23740 C+G ...IA app\CEF\NVIDIA Overlay.exe N/A | | 0 N/A N/A 24328 C+G ...5n1h2txyewy\TextInputHost.exe N/A | | 0 N/A N/A 30000 C+G ...s\Win64\EpicGamesLauncher.exe N/A | | 0 N/A N/A 30372 C+G ...launcher\AdskAccessUIHost.exe N/A | | 0 N/A N/A 30960 C+G ...aries\Win64\EpicWebHelper.exe N/A | | 0 N/A N/A 33592 C+G ...NVIDIA Omniverse Launcher.exe N/A | | 0 N/A N/A 34848 C+G ...rEngine\BaiduNetdiskUnite.exe N/A | | 0 N/A N/A 38224 C+G ...Chrome\Application\chrome.exe N/A | | 0 N/A N/A 39448 C+G ...Chrome\Application\chrome.exe N/A | | 0 N/A N/A 42184 C+G ...r\frontend\Docker Desktop.exe N/A | | 0 N/A N/A 44952 C+G C:\Windows\explorer.exe N/A | +-----------------------------------------------------------------------------------------+

HMyaoyuan avatar Mar 12 '25 09:03 HMyaoyuan

C:\Users\RunChen>docker run --rm --runtime=nvidia --gpus all guiji2025/heygem.ai nvidia-smi Wed Mar 12 23:59:36 2025 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 570.124.06 Driver Version: 572.70 CUDA Version: 12.8 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 4090 On | 00000000:01:00.0 On | Off | | 0% 44C P8 21W / 450W | 8151MiB / 24564MiB | 7% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 35 G /Xwayland N/A | +-----------------------------------------------------------------------------------------+

HMyaoyuan avatar Mar 12 '25 16:03 HMyaoyuan

综上,电脑上安装了显卡驱动,然后运行docker run --rm --runtime=nvidia --gpus all guiji2025/heygem.ai nvidia-smi也是正常可以调用gpu的,不明白为什么f2f运行会有问题

HMyaoyuan avatar Mar 12 '25 16:03 HMyaoyuan

折腾了好久,我改了docker-compose.yaml,终于不会反复启动f2f了。终于可以本地运行了 networks: ai_network: driver: bridge

services: heygem-tts: image: guiji2025/fish-speech-ziming container_name: heygem-tts restart: always runtime: nvidia environment: - NVIDIA_VISIBLE_DEVICES=0 - NVIDIA_DRIVER_CAPABILITIES=compute,graphics,utility,video,display ports: - '18180:8080' volumes: - d:/heygem_data/voice/data:/code/data command: /bin/bash -c "/opt/conda/envs/python310/bin/python3 tools/api_server.py --listen 0.0.0.0:8080" networks: - ai_network heygem-asr: image: guiji2025/fun-asr container_name: heygem-asr restart: always runtime: nvidia privileged: true working_dir: /workspace/FunASR/runtime ports: - '10095:10095' command: sh /run.sh deploy: resources: reservations: devices: - driver: nvidia count: all capabilities: [gpu] networks: - ai_network heygem-f2f: image: guiji2025/heygem.ai container_name: heygem-f2f restart: always runtime: nvidia privileged: true volumes: - d:/heygem_data/face2face:/code/data environment: - PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512 - NVIDIA_VISIBLE_DEVICES=0 - NVIDIA_DRIVER_CAPABILITIES=compute,graphics,utility,video,display shm_size: '8g' ports: - '8383:8383' command: python /code/app_local.py networks: - ai_network

HMyaoyuan avatar Mar 12 '25 16:03 HMyaoyuan

本地部署的效果怎么样?老哥

tmhulw avatar Mar 13 '25 01:03 tmhulw

本地部署的效果怎么样?老哥

效果很好,就是克隆的音色非常一般(1min视频)

HMyaoyuan avatar Mar 13 '25 02:03 HMyaoyuan

thanks

whl88 avatar Mar 13 '25 02:03 whl88

你好,我在Ubuntu上也碰到了这个错误,我看你是windows上遇到了,windows上我是正常的。想问Linux上你有启动成功吗?

wangce1998 avatar Mar 13 '25 03:03 wangce1998

docker run --rm --runtime=nvidia --gpus all guiji2025/heygem.ai nvidia-smi

不太懂,这个命令应该在哪里运行?

taotaoccc avatar Mar 14 '25 09:03 taotaoccc

thanks

在docker里删掉,又pull了,问题依然存在tts无限重启

taotaoccc avatar Mar 14 '25 09:03 taotaoccc

折腾了好久,我改了docker-compose.yaml,终于不会反复启动f2f了。终于可以本地运行了 networks: ai_network: driver: bridge

大佬,您是在- PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512下面增加了这两行解决的吗?我加了还是不好使。我机器上有8张4090的卡,Ubuntu系统的

  • NVIDIA_VISIBLE_DEVICES=0
  • NVIDIA_DRIVER_CAPABILITIES=compute,graphics,utility,video,display

gavid0124 avatar Mar 24 '25 03:03 gavid0124

你好,我在Ubuntu上也碰到了这个错误,我看你是windows上遇到了,windows上我是正常的。想问Linux上你有启动成功吗?

我也是Ubuntu系统出的这个问题,您解决了没?

gavid0124 avatar Mar 25 '25 01:03 gavid0124

你好,我在Ubuntu上也碰到了这个错误,我看你是windows上遇到了,windows上我是正常的。想问Linux上你有启动成功吗?

我也是Ubuntu系统出的这个问题,您解决了没?

哥,请问您解决了吗

ssp-seven avatar Apr 11 '25 09:04 ssp-seven

你好,我在Ubuntu上也碰到了这个错误,我看你是windows上遇到了,windows上我是正常的。想问Linux上你有启动成功吗?

我也是Ubuntu系统出的这个问题,您解决了没?

哥,请问您解决了吗

后来没再研究了,我本来是想用Ubuntu做服务端,看文档说不支持目前是单机版,如果想实现可以调用接口,前端UI需要自己开发

gavid0124 avatar Apr 14 '25 02:04 gavid0124

折腾了好久,我改了docker-compose.yaml,终于不会反复启动f2f了。终于可以本地运行了 networks: ai_network: driver: bridge

大佬,您是在- PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512下面增加了这两行解决的吗?我加了还是不好使。我机器上有8张4090的卡,Ubuntu系统的

  • NVIDIA_VISIBLE_DEVICES=0
  • NVIDIA_DRIVER_CAPABILITIES=compute,graphics,utility,video,display

不只是增加,也有减少。我只用windows试过,单卡4090

HMyaoyuan avatar Apr 15 '25 04:04 HMyaoyuan

docker run --rm --runtime=nvidia --gpus all guiji2025/heygem.ai nvidia-smi

不太懂,这个命令应该在哪里运行?

cmd里

HMyaoyuan avatar Apr 15 '25 04:04 HMyaoyuan

你好,我在Ubuntu上也碰到了这个错误,我看你是windows上遇到了,windows上我是正常的。想问Linux上你有启动成功吗?

只用过windows,linux没用过

HMyaoyuan avatar Apr 15 '25 04:04 HMyaoyuan

你好,我在Ubuntu上也碰到了这个错误,我看你是windows上遇到了,windows上我是正常的。想问Linux上你有启动成功吗?

我也是Ubuntu系统出的这个问题,您解决了没?

哥,请问您解决了吗

后来没再研究了,我本来是想用Ubuntu做服务端,看文档说不支持目前是单机版,如果想实现可以调用接口,前端UI需要自己开发

请问哥成功启动了三个容器嘛?我出现了以下问题,哥有遇到过嘛? 问题:heygem-gen-video | RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

heygem-gen-video | 2025-04-16 10:51:13,256 - cv2box - INFO - Use default multi mode: multi-thread, or you can set env 'CV_MULTI_MODE' to multi-process/torch-process

heygem-gen-video | [2025-04-16 10:51:13] [cv_logging.py[line:27]] [INFO] [Use default multi mode: multi-thread, or you can set env 'CV_MULTI_MODE' to multi-process/torch-process]

heygem-gen-video | [2025-04-16 10:51:14] [app_local.py[line:230]] [WARNING] [ -> 服务不进行注册]

heygem-gen-video | [2025-04-16 10:51:14] [app_local.py[line:231]] [INFO] [TransDhTask init]

ssp-seven avatar Apr 16 '25 02:04 ssp-seven

同样的问题,试了一下,的确解决了。原来是一直卡在5%就不动了。感谢分享。

beastq7777 avatar Sep 05 '25 06:09 beastq7777