Duix.Heygem icon indicating copy to clipboard operation
Duix.Heygem copied to clipboard

heygem-gen-video启动时提示找不到显卡驱动

Open xuzhhua opened this issue 8 months ago • 10 comments

你好,如题。我已正确安装nvidia显卡和驱动,但依然其实找不到驱动,请问有无办法解决?谢谢。 heygem-gen-video反复重新启动,log提示如下。

2025-04-15 21:01:15 Traceback (most recent call last):
2025-04-15 21:01:15   File "/code/app_local.py", line 231, in <module>
2025-04-15 21:01:15     TransDhTask.instance()
2025-04-15 21:01:15   File "trans_dh_service.py", line 1207, in trans_dh_service.TransDhTask.instance
2025-04-15 21:01:15   File "trans_dh_service.py", line 1189, in trans_dh_service.TransDhTask.__init__
2025-04-15 21:01:15   File "compute_ctc_att_bnf.py", line 130, in compute_ctc_att_bnf.load_ppg_model
2025-04-15 21:01:15   File "/usr/local/python3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1152, in to
2025-04-15 21:01:15     return self._apply(convert)
2025-04-15 21:01:15   File "/usr/local/python3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 802, in _apply
2025-04-15 21:01:15     module._apply(fn)
2025-04-15 21:01:15   File "/usr/local/python3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 802, in _apply
2025-04-15 21:01:15     module._apply(fn)
2025-04-15 21:01:15   File "/usr/local/python3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 802, in _apply
2025-04-15 21:01:15     module._apply(fn)
2025-04-15 21:01:15   [Previous line repeated 1 more time]
2025-04-15 21:01:15   File "/usr/local/python3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 825, in _apply
2025-04-15 21:01:15     param_applied = fn(param)
2025-04-15 21:01:15   File "/usr/local/python3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1150, in convert
2025-04-15 21:01:15     return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
2025-04-15 21:01:15   File "/usr/local/python3/lib/python3.8/site-packages/torch/cuda/__init__.py", line 302, in _lazy_init
2025-04-15 21:01:15     torch._C._cuda_init()
2025-04-15 21:01:15 RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx
2025-04-15 21:01:18 2025-04-15 21:01:18,098 - cv2box - INFO - Use default multi mode: multi-thread, or you can set env 'CV_MULTI_MODE' to multi-process/torch-process
C:\>nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Sep_12_02:55:00_Pacific_Daylight_Time_2024
Cuda compilation tools, release 12.6, V12.6.77
Build cuda_12.6.r12.6/compiler.34841621_0

C:\>nvidia-smi
Tue Apr 15 20:51:35 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 572.83                 Driver Version: 572.83         CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                  Driver-Model | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 2080 Ti   WDDM  |   00000000:01:00.0  On |                  N/A |
| 27%   35C    P8             14W /  250W |    1183MiB /  22528MiB |      3%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |

已确认过所有的Issues。 已使用最新的docker-compose.yml windows:windows11 24H2 Docker desktop:4.35.1 Nodejs:18.20.8 WSL: 2.4.13

xuzhhua avatar Apr 15 '25 13:04 xuzhhua

补充。本机可以正常使用stable diffusion webui和ollama。

xuzhhua avatar Apr 15 '25 13:04 xuzhhua

linux版本出现同样的问题,请问有解决办法嘛

ssp-seven avatar Apr 16 '25 02:04 ssp-seven

你好,如题。我已正确安装nvidia显卡和驱动,但依然其实找不到驱动,请问有无办法解决?谢谢。 heygem-gen-video反复重新启动,log提示如下。

2025-04-15 21:01:15 Traceback (most recent call last):
2025-04-15 21:01:15   File "/code/app_local.py", line 231, in <module>
2025-04-15 21:01:15     TransDhTask.instance()
2025-04-15 21:01:15   File "trans_dh_service.py", line 1207, in trans_dh_service.TransDhTask.instance
2025-04-15 21:01:15   File "trans_dh_service.py", line 1189, in trans_dh_service.TransDhTask.__init__
2025-04-15 21:01:15   File "compute_ctc_att_bnf.py", line 130, in compute_ctc_att_bnf.load_ppg_model
2025-04-15 21:01:15   File "/usr/local/python3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1152, in to
2025-04-15 21:01:15     return self._apply(convert)
2025-04-15 21:01:15   File "/usr/local/python3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 802, in _apply
2025-04-15 21:01:15     module._apply(fn)
2025-04-15 21:01:15   File "/usr/local/python3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 802, in _apply
2025-04-15 21:01:15     module._apply(fn)
2025-04-15 21:01:15   File "/usr/local/python3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 802, in _apply
2025-04-15 21:01:15     module._apply(fn)
2025-04-15 21:01:15   [Previous line repeated 1 more time]
2025-04-15 21:01:15   File "/usr/local/python3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 825, in _apply
2025-04-15 21:01:15     param_applied = fn(param)
2025-04-15 21:01:15   File "/usr/local/python3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1150, in convert
2025-04-15 21:01:15     return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
2025-04-15 21:01:15   File "/usr/local/python3/lib/python3.8/site-packages/torch/cuda/__init__.py", line 302, in _lazy_init
2025-04-15 21:01:15     torch._C._cuda_init()
2025-04-15 21:01:15 RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx
2025-04-15 21:01:18 2025-04-15 21:01:18,098 - cv2box - INFO - Use default multi mode: multi-thread, or you can set env 'CV_MULTI_MODE' to multi-process/torch-process
C:\>nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Sep_12_02:55:00_Pacific_Daylight_Time_2024
Cuda compilation tools, release 12.6, V12.6.77
Build cuda_12.6.r12.6/compiler.34841621_0

C:\>nvidia-smi
Tue Apr 15 20:51:35 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 572.83                 Driver Version: 572.83         CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                  Driver-Model | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 2080 Ti   WDDM  |   00000000:01:00.0  On |                  N/A |
| 27%   35C    P8             14W /  250W |    1183MiB /  22528MiB |      3%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |

已确认过所有的Issues。 已使用最新的docker-compose.yml windows:windows11 24H2 Docker desktop:4.35.1 Nodejs:18.20.8 WSL: 2.4.13

按这个来,大概率是cuda 和驱动太新了。 https://www.bilibili.com/video/BV1T4dBYNEA3/

ops120 avatar Apr 16 '25 05:04 ops120

docker-compose-linux.yml下面的heygem-gen-video改成这样就行了,主要是devices那里,加上 - driver: nvidia count: all

 heygem-gen-video:
    image: guiji2025/heygem.ai
    container_name: heygem-gen-video
    restart: always
    runtime: nvidia
    privileged: true
    volumes:
      - ~/heygem_data/face2face:/code/data
    environment:
      - NVIDIA_VISIBLE_DEVICES=all
        PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    shm_size: '8g'
    ports:
      - '8383:8383'
    command: python /code/app_local.py
    networks:
      - ai_network

eggggi avatar Apr 17 '25 01:04 eggggi

你好,如题。我已正确安装nvidia显卡和驱动,但依然其实找不到驱动,请问有无办法解决?谢谢。 heygem-gen-video反复重新启动,log提示如下。

2025-04-15 21:01:15 Traceback (most recent call last):
2025-04-15 21:01:15   File "/code/app_local.py", line 231, in <module>
2025-04-15 21:01:15     TransDhTask.instance()
2025-04-15 21:01:15   File "trans_dh_service.py", line 1207, in trans_dh_service.TransDhTask.instance
2025-04-15 21:01:15   File "trans_dh_service.py", line 1189, in trans_dh_service.TransDhTask.__init__
2025-04-15 21:01:15   File "compute_ctc_att_bnf.py", line 130, in compute_ctc_att_bnf.load_ppg_model
2025-04-15 21:01:15   File "/usr/local/python3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1152, in to
2025-04-15 21:01:15     return self._apply(convert)
2025-04-15 21:01:15   File "/usr/local/python3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 802, in _apply
2025-04-15 21:01:15     module._apply(fn)
2025-04-15 21:01:15   File "/usr/local/python3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 802, in _apply
2025-04-15 21:01:15     module._apply(fn)
2025-04-15 21:01:15   File "/usr/local/python3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 802, in _apply
2025-04-15 21:01:15     module._apply(fn)
2025-04-15 21:01:15   [Previous line repeated 1 more time]
2025-04-15 21:01:15   File "/usr/local/python3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 825, in _apply
2025-04-15 21:01:15     param_applied = fn(param)
2025-04-15 21:01:15   File "/usr/local/python3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1150, in convert
2025-04-15 21:01:15     return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
2025-04-15 21:01:15   File "/usr/local/python3/lib/python3.8/site-packages/torch/cuda/__init__.py", line 302, in _lazy_init
2025-04-15 21:01:15     torch._C._cuda_init()
2025-04-15 21:01:15 RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx
2025-04-15 21:01:18 2025-04-15 21:01:18,098 - cv2box - INFO - Use default multi mode: multi-thread, or you can set env 'CV_MULTI_MODE' to multi-process/torch-process
C:\>nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Sep_12_02:55:00_Pacific_Daylight_Time_2024
Cuda compilation tools, release 12.6, V12.6.77
Build cuda_12.6.r12.6/compiler.34841621_0

C:\>nvidia-smi
Tue Apr 15 20:51:35 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 572.83                 Driver Version: 572.83         CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                  Driver-Model | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 2080 Ti   WDDM  |   00000000:01:00.0  On |                  N/A |
| 27%   35C    P8             14W /  250W |    1183MiB /  22528MiB |      3%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |

已确认过所有的Issues。 已使用最新的docker-compose.yml windows:windows11 24H2 Docker desktop:4.35.1 Nodejs:18.20.8 WSL: 2.4.13

按这个来,大概率是cuda 和驱动太新了。 https://www.bilibili.com/video/BV1T4dBYNEA3/

非常感谢,我先尝试一下。

xuzhhua avatar Apr 20 '25 05:04 xuzhhua

docker-compose-linux.yml下面的heygem-gen-video改成这样就行了,主要是devices那里,加上 - driver: nvidia count: all

 heygem-gen-video:
    image: guiji2025/heygem.ai
    container_name: heygem-gen-video
    restart: always
    runtime: nvidia
    privileged: true
    volumes:
      - ~/heygem_data/face2face:/code/data
    environment:
      - NVIDIA_VISIBLE_DEVICES=all
        PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    shm_size: '8g'
    ports:
      - '8383:8383'
    command: python /code/app_local.py
    networks:
      - ai_network

非常感谢,我先尝试一下。

xuzhhua avatar Apr 20 '25 05:04 xuzhhua

docker-compose-linux.yml下面的heygem-gen-video改成这样就行了,主要是devices那里,加上 - driver: nvidia count: all

 heygem-gen-video:
    image: guiji2025/heygem.ai
    container_name: heygem-gen-video
    restart: always
    runtime: nvidia
    privileged: true
    volumes:
      - ~/heygem_data/face2face:/code/data
    environment:
      - NVIDIA_VISIBLE_DEVICES=all
        PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    shm_size: '8g'
    ports:
      - '8383:8383'
    command: python /code/app_local.py
    networks:
      - ai_network

非常感谢,我先尝试一下。

我在朋友机器了 2080 cuda 12.8也测试通过了。你可以先按这个试试,现在CUDA也不是问题

ops120 avatar Apr 20 '25 14:04 ops120

docker-compose-linux.yml下面的heygem-gen-video改成这样就行了,主要是devices那里,加上 - driver: nvidia count: all

 heygem-gen-video:
    image: guiji2025/heygem.ai
    container_name: heygem-gen-video
    restart: always
    runtime: nvidia
    privileged: true
    volumes:
      - ~/heygem_data/face2face:/code/data
    environment:
      - NVIDIA_VISIBLE_DEVICES=all
        PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    shm_size: '8g'
    ports:
      - '8383:8383'
    command: python /code/app_local.py
    networks:
      - ai_network

非常感谢,我先尝试一下。

我在朋友机器了 2080 cuda 12.8也测试通过了。你可以先按这个试试,现在CUDA也不是问题

非常感谢。有事迟了点,但是你的方法确实可行。我修改了windows的配置,也一样可行。太谢谢了。

xuzhhua avatar May 02 '25 03:05 xuzhhua

docker-compose-linux.yml下面的heygem-gen-video改成这样就行了,主要是devices那里,加上 - driver: nvidia count: all

 heygem-gen-video:
    image: guiji2025/heygem.ai
    container_name: heygem-gen-video
    restart: always
    runtime: nvidia
    privileged: true
    volumes:
      - ~/heygem_data/face2face:/code/data
    environment:
      - NVIDIA_VISIBLE_DEVICES=all
        PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    shm_size: '8g'
    ports:
      - '8383:8383'
    command: python /code/app_local.py
    networks:
      - ai_network

win10系统,虽然不理解为什么要加这个,但是确实可行

lewisxiao avatar Jun 25 '25 14:06 lewisxiao

harbor 上 官方放的这个 heygem.ai 的镜像不再包含cuda;你们如果单独跑就用 推荐用这个历史版本:guiji2025/heygem.ai:0.0.7_sdk_slim

要么就带上那个tts的放一个pod里

willchen0729 avatar Jul 25 '25 12:07 willchen0729