每次都是进度卡在了5%,等到611秒后出现报错信息
ubuntu服务器信息如下 nvidia-container-toolkit Version: 1.17.7-1 NVIDIA-SMI 550.144.03 Driver Version: 550.144.03 CUDA Version: 12.4 onnxruntime-gpu 1.22.0 Cuda compilation tools, release 11.5
报错信息如下 172.18.0.1 - - [29/May/2025 15:37:56] "GET /easy/query?code=5b2bfaf7-d875-4cd9-984e-4dd0763f3499 HTTP/1.1" 200 - 172.18.0.1 - - [29/May/2025 15:37:58] "GET /easy/query?code=5b2bfaf7-d875-4cd9-984e-4dd0763f3499 HTTP/1.1" 200 - 172.18.0.1 - - [29/May/2025 15:38:00] "GET /easy/query?code=5b2bfaf7-d875-4cd9-984e-4dd0763f3499 HTTP/1.1" 200 - 172.18.0.1 - - [29/May/2025 15:38:02] "GET /easy/query?code=5b2bfaf7-d875-4cd9-984e-4dd0763f3499 HTTP/1.1" 200 - Traceback (most recent call last): File "trans_dh_service.py", line 414, in trans_dh_service.TransDhTask.work File "/usr/local/python3/lib/python3.8/multiprocessing/queues.py", line 108, in get raise Empty queue.Empty 2025-05-29 15:38:04,638-[service.self_logger]-threading.py[line:870]-ERROR: 5b2bfaf7-d875-4cd9-984e-4dd0763f3499 -> 任务执行失败,异常信息:[no final result] 2025-05-29 15:38:04,638-[service.self_logger]-threading.py[line:870]-INFO: system ->start to release queue... 172.18.0.1 - - [29/May/2025 15:38:04] "GET /easy/query?code=5b2bfaf7-d875-4cd9-984e-4dd0763f3499 HTTP/1.1" 200 - 2025-05-29 15:38:04,838-[service.self_logger]-threading.py[line:870]-INFO: system ->audio_feature_queue_output release end. 2025-05-29 15:38:05,039-[service.self_logger]-threading.py[line:870]-INFO: system ->output_queue release end. 2025-05-29 15:38:05,240-[service.self_logger]-threading.py[line:870]-INFO: system ->trans_queue release end. 2025-05-29 15:38:05,440-[service.self_logger]-threading.py[line:870]-INFO: system ->init_wh_queue release end. 2025-05-29 15:38:05,641-[service.self_logger]-threading.py[line:870]-INFO: system ->init_wh_queue_output release end. 2025-05-29 15:38:05,641-[service.self_logger]-threading.py[line:870]-INFO: system ->trans_queue release end. 2025-05-29 15:38:05,641-[service.self_logger]-threading.py[line:870]-INFO: 5b2bfaf7-d875-4cd9-984e-4dd0763f3499 ->kill all process 2025-05-29 15:38:05,645-[service.self_logger]-threading.py[line:870]-INFO: 5b2bfaf7-d875-4cd9-984e-4dd0763f3499 ->all process killed and restart 2025-05-29 15:38:05,660-[service.self_logger]-process.py[line:108]-INFO: system ->init_wh_process start... 2025-05-29 15:38:05,680-[service.self_logger]-threading.py[line:870]-INFO: system ->get_audio_p[201] transfer_p[202] start 检测人脸使用GPU *************** EP Error *************** EP Error /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:121 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:114 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDA failure 35: CUDA driver version is insufficient for CUDA runtime version ; GPU=32521 ; hostname=63429f75fe5f ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider.cc ; line=245 ; expr=cudaSetDevice(info.device_id);
when using ['CUDAExecutionProvider'] Falling back to ['CUDAExecutionProvider', 'CPUExecutionProvider'] and retrying.
Process Process-10: Traceback (most recent call last): File "trans_dh_service.py", line 414, in trans_dh_service.TransDhTask.work File "/usr/local/python3/lib/python3.8/multiprocessing/queues.py", line 108, in get raise Empty _queue.Empty
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/usr/local/python3/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 419, in init self._create_inference_session(providers, provider_options, disabled_optimizers) File "/usr/local/python3/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 483, in create_inference_session sess.initialize_session(providers, provider_options, disabled_optimizers) RuntimeError: /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:121 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:114 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDA failure 35: CUDA driver version is insufficient for CUDA runtime version ; GPU=32521 ; hostname=63429f75fe5f ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider.cc ; line=245 ; expr=cudaSetDevice(info.device_id);
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "/usr/local/python3/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/usr/local/python3/lib/python3.8/multiprocessing/process.py", line 108, in run self.target(self._args, self._kwargs) File "trans_dh_service.py", line 113, in trans_dh_service.init_wh_process File "face_detect.py", line 18, in face_detect.FaceDetect.init File "scrfd.py", line 90, in scrfd.SCRFD.init File "/usr/local/python3/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 432, in init raise fallback_error from e File "/usr/local/python3/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 427, in init self._create_inference_session(self._fallback_providers, None) File "/usr/local/python3/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 483, in _create_inference_session sess.initialize_session(providers, provider_options, disabled_optimizers) RuntimeError: /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:121 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char, const char, ERRTYPE, const char, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:114 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDA failure 35: CUDA driver version is insufficient for CUDA runtime version ; GPU=0 ; hostname=63429f75fe5f ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider.cc ; line=245 ; expr=cudaSetDevice(info.device_id);
Process Process-11: Traceback (most recent call last): File "trans_dh_service.py", line 414, in trans_dh_service.TransDhTask.work File "/usr/local/python3/lib/python3.8/multiprocessing/queues.py", line 108, in get raise Empty _queue.Empty
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/python3/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/usr/local/python3/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "trans_dh_service.py", line 55, in trans_dh_service.get_aud_feat1
File "compute_ctc_att_bnf.py", line 130, in compute_ctc_att_bnf.load_ppg_model
File "/usr/local/python3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1152, in to
return self._apply(convert)
File "/usr/local/python3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 802, in _apply
module._apply(fn)
File "/usr/local/python3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 802, in _apply
module._apply(fn)
File "/usr/local/python3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 802, in _apply
module._apply(fn)
[Previous line repeated 1 more time]
File "/usr/local/python3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 825, in _apply
param_applied = fn(param)
File "/usr/local/python3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1150, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
File "/usr/local/python3/lib/python3.8/site-packages/torch/cuda/init.py", line 302, in _lazy_init
torch._C._cuda_init()
RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx
----------------- Options ---------------
aspect_ratio: 1.0
audio_feature: 3dmm
batch_size: 16
checkpoints_dir: ./landmark2face_wy/checkpoints
crop_size: 256
dataroot: ./data
dataset_mode: Facereala3dmm
direction: AtoB
display_winsize: 256
distributed: False
epoch: latest
eval: False
feat_num: 3
feature_path: ../AnnI_deep3dface_256_contains_id/
fp16: False
gpu_ids: 0
img_size: 256
Process Process-12:
init_gain: 0.02
Traceback (most recent call last):
File "trans_dh_service.py", line 414, in trans_dh_service.TransDhTask.work
File "/usr/local/python3/lib/python3.8/multiprocessing/queues.py", line 108, in get
raise Empty
_queue.Empty
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/python3/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/usr/local/python3/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "ai_service.py", line 105, in ai_service.av_transfer
File "digitalhuman_interface.py", line 18, in digitalhuman_interface.DigitalHumanModel.init
File "base_options.py", line 210, in base_options.BaseOptions.parse
File "/usr/local/python3/lib/python3.8/site-packages/torch/cuda/init.py", line 408, in set_device
torch._C._cuda_setDevice(device)
File "/usr/local/python3/lib/python3.8/site-packages/torch/cuda/init.py", line 302, in _lazy_init
torch._C._cuda_init()
RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx
init_type: normal
input_nc: 3
instance_feat: False
isTrain: False [default: None]
label_feat: False
lan_size: 1
load_features: False
load_iter: 0 [default: 0]
load_size: 286
local_rank: -1
max_dataset_size: inf
mfcc0_rate: 0.2
model: pirender_3dmm_mouth_hd
model_path: ./landmark2face_wy/checkpoints/anylang/dinet_v1_20240131.pth
n_blocks: 9
n_blocks_global: 9
n_blocks_local: 3
n_clusters: 10
n_downsample_E: 4
n_downsample_global: 4
n_layers_D: 3
n_local_enhancers: 1
name: test
ndf: 64
nef: 16
netD: basic
netG: pirender
ngf: 64
niter_fix_global: 0
no_dropout: True
no_flip: False
no_ganFeat_loss: False
no_instance: False
norm: instance
ntest: inf
num_D: 2
num_test: 50
num_threads: 4
output_nc: 3
perceptual_layers: ['relu_1_1', 'relu_2_1', 'relu_3_1', 'relu_4_1', 'relu_5_1']
perceptual_network: vgg19
perceptual_num_scales: 4
perceptual_use_style_loss: True
perceptual_weights: [4, 4, 4, 4, 4]
phase: test
preprocess: resize_and_crop
resize_size: 512
results_dir: ./results/
serial_batches: False
suffix:
test_audio_path: None
test_muban: None
verbose: False
weight_style_to_perceptual: 250
----------------- End -------------------
172.18.0.1 - - [29/May/2025 15:38:06] "GET /easy/query?code=5b2bfaf7-d875-4cd9-984e-4dd0763f3499 HTTP/1.1" 200 -
172.18.0.1 - - [29/May/2025 15:38:08] "GET /easy/query?code=5b2bfaf7-d875-4cd9-984e-4dd0763f3499 HTTP/1.1" 200 -
172.18.0.1 - - [29/May/2025 15:38:10] "GET /easy/query?code=5b2bfaf7-d875-4cd9-984e-4dd0763f3499 HTTP/1.1" 200 -
172.18.0.1 - - [29/May/2025 15:38:12] "GET /easy/query?code=5b2bfaf7-d875-4cd9-984e-4dd0763f3499 HTTP/1.1" 200 -
172.18.0.1 - - [29/May/2025 15:38:14] "GET /easy/query?code=5b2bfaf7-d875-4cd9-984e-4dd0763f3499 HTTP/1.1" 200 -
2025-05-29 15:38:15,877-[service.self_logger]-threading.py[line:870]-INFO: 5b2bfaf7-d875-4cd9-984e-4dd0763f3499 -> 耗时:611.458s
172.18.0.1 - - [29/May/2025 15:38:16] "GET /easy/query?code=5b2bfaf7-d875-4cd9-984e-4dd0763f3499 HTTP/1.1" 200 -
ubuntu服务器信息如下 nvidia-container-toolkit Version: 1.17.7-1 NVIDIA-SMI 550.144.03 Driver Version: 550.144.03 CUDA Version: 12.4 onnxruntime-gpu 1.22.0 Cuda compilation tools, release 11.5
报错信息如下 172.18.0.1 - - [29/May/2025 15:37:56] "GET /easy/query?code=5b2bfaf7-d875-4cd9-984e-4dd0763f3499 HTTP/1.1" 200 - 172.18.0.1 - - [29/May/2025 15:37:58] "GET /easy/query?code=5b2bfaf7-d875-4cd9-984e-4dd0763f3499 HTTP/1.1" 200 - 172.18.0.1 - - [29/May/2025 15:38:00] "GET /easy/query?code=5b2bfaf7-d875-4cd9-984e-4dd0763f3499 HTTP/1.1" 200 - 172.18.0.1 - - [29/May/2025 15:38:02] "GET /easy/query?code=5b2bfaf7-d875-4cd9-984e-4dd0763f3499 HTTP/1.1" 200 - Traceback (most recent call last): File "trans_dh_service.py", line 414, in trans_dh_service.TransDhTask.work File "/usr/local/python3/lib/python3.8/multiprocessing/queues.py", line 108, in get raise Empty queue.Empty 2025-05-29 15:38:04,638-[service.self_logger]-threading.py[line:870]-ERROR: 5b2bfaf7-d875-4cd9-984e-4dd0763f3499 -> 任务执行失败,异常信息:[no final result] 2025-05-29 15:38:04,638-[service.self_logger]-threading.py[line:870]-INFO: system ->start to release queue... 172.18.0.1 - - [29/May/2025 15:38:04] "GET /easy/query?code=5b2bfaf7-d875-4cd9-984e-4dd0763f3499 HTTP/1.1" 200 - 2025-05-29 15:38:04,838-[service.self_logger]-threading.py[line:870]-INFO: system ->audio_feature_queue_output release end. 2025-05-29 15:38:05,039-[service.self_logger]-threading.py[line:870]-INFO: system ->output_queue release end. 2025-05-29 15:38:05,240-[service.self_logger]-threading.py[line:870]-INFO: system ->trans_queue release end. 2025-05-29 15:38:05,440-[service.self_logger]-threading.py[line:870]-INFO: system ->init_wh_queue release end. 2025-05-29 15:38:05,641-[service.self_logger]-threading.py[line:870]-INFO: system ->init_wh_queue_output release end. 2025-05-29 15:38:05,641-[service.self_logger]-threading.py[line:870]-INFO: system ->trans_queue release end. 2025-05-29 15:38:05,641-[service.self_logger]-threading.py[line:870]-INFO: 5b2bfaf7-d875-4cd9-984e-4dd0763f3499 ->kill all process 2025-05-29 15:38:05,645-[service.self_logger]-threading.py[line:870]-INFO: 5b2bfaf7-d875-4cd9-984e-4dd0763f3499 ->all process killed and restart 2025-05-29 15:38:05,660-[service.self_logger]-process.py[line:108]-INFO: system ->init_wh_process start... 2025-05-29 15:38:05,680-[service.self_logger]-threading.py[line:870]-INFO: system ->get_audio_p[201] transfer_p[202] start 检测人脸使用GPU *************** EP Error *************** EP Error /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:121 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:114 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDA failure 35: CUDA driver version is insufficient for CUDA runtime version ; GPU=32521 ; hostname=63429f75fe5f ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider.cc ; line=245 ; expr=cudaSetDevice(info.device_id);
when using ['CUDAExecutionProvider'] Falling back to ['CUDAExecutionProvider', 'CPUExecutionProvider'] and retrying.
Process Process-10: Traceback (most recent call last): File "trans_dh_service.py", line 414, in trans_dh_service.TransDhTask.work File "/usr/local/python3/lib/python3.8/multiprocessing/queues.py", line 108, in get raise Empty _queue.Empty
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/usr/local/python3/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 419, in init self._create_inference_session(providers, provider_options, disabled_optimizers) File "/usr/local/python3/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 483, in create_inference_session sess.initialize_session(providers, provider_options, disabled_optimizers) RuntimeError: /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:121 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:114 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDA failure 35: CUDA driver version is insufficient for CUDA runtime version ; GPU=32521 ; hostname=63429f75fe5f ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider.cc ; line=245 ; expr=cudaSetDevice(info.device_id);
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "/usr/local/python3/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/usr/local/python3/lib/python3.8/multiprocessing/process.py", line 108, in run self._target(_self._args, __self.kwargs) File "trans_dh_service.py", line 113, in trans_dh_service.init_wh_process File "face_detect.py", line 18, in face_detect.FaceDetect.init File "scrfd.py", line 90, in scrfd.SCRFD.init File "/usr/local/python3/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 432, in init raise fallback_error from e File "/usr/local/python3/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 427, in init self.create_inference_session(self.fallback_providers, None) File "/usr/local/python3/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 483, in create_inference_session sess.initialize_session(providers, provider_options, disabled_optimizers) RuntimeError: /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:121 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char, const char, ERRTYPE, const char, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:114 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDA failure 35: CUDA driver version is insufficient for CUDA runtime version ; GPU=0 ; hostname=63429f75fe5f ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider.cc ; line=245 ; expr=cudaSetDevice(info.device_id);
Process Process-11: Traceback (most recent call last): File "trans_dh_service.py", line 414, in trans_dh_service.TransDhTask.work File "/usr/local/python3/lib/python3.8/multiprocessing/queues.py", line 108, in get raise Empty _queue.Empty
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/usr/local/python3/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/usr/local/python3/lib/python3.8/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "trans_dh_service.py", line 55, in trans_dh_service.get_aud_feat1 File "compute_ctc_att_bnf.py", line 130, in compute_ctc_att_bnf.load_ppg_model File "/usr/local/python3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1152, in to return self._apply(convert) File "/usr/local/python3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 802, in _apply module._apply(fn) File "/usr/local/python3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 802, in _apply module._apply(fn) File "/usr/local/python3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 802, in _apply module._apply(fn) [Previous line repeated 1 more time] File "/usr/local/python3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 825, in _apply param_applied = fn(param) File "/usr/local/python3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1150, in convert return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) File "/usr/local/python3/lib/python3.8/site-packages/torch/cuda/init.py", line 302, in _lazy_init torch._C._cuda_init() RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx ----------------- Options --------------- aspect_ratio: 1.0 audio_feature: 3dmm batch_size: 16 checkpoints_dir: ./landmark2face_wy/checkpoints crop_size: 256 dataroot: ./data dataset_mode: Facereala3dmm direction: AtoB display_winsize: 256 distributed: False epoch: latest eval: False feat_num: 3 feature_path: ../AnnI_deep3dface_256_contains_id/ fp16: False gpu_ids: 0 img_size: 256 Process Process-12: init_gain: 0.02 Traceback (most recent call last): File "trans_dh_service.py", line 414, in trans_dh_service.TransDhTask.work File "/usr/local/python3/lib/python3.8/multiprocessing/queues.py", line 108, in get raise Empty _queue.Empty
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/usr/local/python3/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/usr/local/python3/lib/python3.8/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "ai_service.py", line 105, in ai_service.av_transfer File "digitalhuman_interface.py", line 18, in digitalhuman_interface.DigitalHumanModel.init File "base_options.py", line 210, in base_options.BaseOptions.parse File "/usr/local/python3/lib/python3.8/site-packages/torch/cuda/init.py", line 408, in set_device torch._C._cuda_setDevice(device) File "/usr/local/python3/lib/python3.8/site-packages/torch/cuda/init.py", line 302, in _lazy_init torch._C._cuda_init() RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx init_type: normal input_nc: 3 instance_feat: False isTrain: False [default: None] label_feat: False lan_size: 1 load_features: False load_iter: 0 [default: 0] load_size: 286 local_rank: -1 max_dataset_size: inf mfcc0_rate: 0.2 model: pirender_3dmm_mouth_hd model_path: ./landmark2face_wy/checkpoints/anylang/dinet_v1_20240131.pth n_blocks: 9 n_blocks_global: 9 n_blocks_local: 3 n_clusters: 10 n_downsample_E: 4 n_downsample_global: 4 n_layers_D: 3 n_local_enhancers: 1 name: test ndf: 64 nef: 16 netD: basic netG: pirender ngf: 64 niter_fix_global: 0 no_dropout: True no_flip: False no_ganFeat_loss: False no_instance: False norm: instance ntest: inf num_D: 2 num_test: 50 num_threads: 4 output_nc: 3 perceptual_layers: ['relu_1_1', 'relu_2_1', 'relu_3_1', 'relu_4_1', 'relu_5_1'] perceptual_network: vgg19 perceptual_num_scales: 4 perceptual_use_style_loss: True perceptual_weights: [4, 4, 4, 4, 4] phase: test preprocess: resize_and_crop resize_size: 512 results_dir: ./results/ serial_batches: False suffix: test_audio_path: None test_muban: None verbose: False weight_style_to_perceptual: 250 ----------------- End ------------------- 172.18.0.1 - - [29/May/2025 15:38:06] "GET /easy/query?code=5b2bfaf7-d875-4cd9-984e-4dd0763f3499 HTTP/1.1" 200 - 172.18.0.1 - - [29/May/2025 15:38:08] "GET /easy/query?code=5b2bfaf7-d875-4cd9-984e-4dd0763f3499 HTTP/1.1" 200 - 172.18.0.1 - - [29/May/2025 15:38:10] "GET /easy/query?code=5b2bfaf7-d875-4cd9-984e-4dd0763f3499 HTTP/1.1" 200 - 172.18.0.1 - - [29/May/2025 15:38:12] "GET /easy/query?code=5b2bfaf7-d875-4cd9-984e-4dd0763f3499 HTTP/1.1" 200 - 172.18.0.1 - - [29/May/2025 15:38:14] "GET /easy/query?code=5b2bfaf7-d875-4cd9-984e-4dd0763f3499 HTTP/1.1" 200 - 2025-05-29 15:38:15,877-[service.self_logger]-threading.py[line:870]-INFO: 5b2bfaf7-d875-4cd9-984e-4dd0763f3499 -> 耗时:611.458s 172.18.0.1 - - [29/May/2025 15:38:16] "GET /easy/query?code=5b2bfaf7-d875-4cd9-984e-4dd0763f3499 HTTP/1.1" 200 -
Sorry for this problem! Judging from the error message, there is a problem with the service startup. Could you please take a screenshot of the command "nvidia-smi"?
ubuntu服务器信息如下 nvidia-container-toolkit Version: 1.17.7-1 NVIDIA-SMI 550.144.03 Driver Version: 550.144.03 CUDA Version: 12.4 onnxruntime-gpu 1.22.0 Cuda compilation tools, release 11.5 报错信息如下 172.18.0.1 - - [29/May/2025 15:37:56] "GET /easy/query?code=5b2bfaf7-d875-4cd9-984e-4dd0763f3499 HTTP/1.1" 200 - 172.18.0.1 - - [29/May/2025 15:37:58] "GET /easy/query?code=5b2bfaf7-d875-4cd9-984e-4dd0763f3499 HTTP/1.1" 200 - 172.18.0.1 - - [29/May/2025 15:38:00] "GET /easy/query?code=5b2bfaf7-d875-4cd9-984e-4dd0763f3499 HTTP/1.1" 200 - 172.18.0.1 - - [29/May/2025 15:38:02] "GET /easy/query?code=5b2bfaf7-d875-4cd9-984e-4dd0763f3499 HTTP/1.1" 200 - Traceback (most recent call last): File "trans_dh_service.py", line 414, in trans_dh_service.TransDhTask.work File "/usr/local/python3/lib/python3.8/multiprocessing/queues.py", line 108, in get raise Empty queue.Empty 2025-05-29 15:38:04,638-[service.self_logger]-threading.py[line:870]-ERROR: 5b2bfaf7-d875-4cd9-984e-4dd0763f3499 -> 任务执行失败,异常信息:[no final result] 2025-05-29 15:38:04,638-[service.self_logger]-threading.py[line:870]-INFO: system ->start to release queue... 172.18.0.1 - - [29/May/2025 15:38:04] "GET /easy/query?code=5b2bfaf7-d875-4cd9-984e-4dd0763f3499 HTTP/1.1" 200 - 2025-05-29 15:38:04,838-[service.self_logger]-threading.py[line:870]-INFO: system ->audio_feature_queue_output release end. 2025-05-29 15:38:05,039-[service.self_logger]-threading.py[line:870]-INFO: system ->output_queue release end. 2025-05-29 15:38:05,240-[service.self_logger]-threading.py[line:870]-INFO: system ->trans_queue release end. 2025-05-29 15:38:05,440-[service.self_logger]-threading.py[line:870]-INFO: system ->init_wh_queue release end. 2025-05-29 15:38:05,641-[service.self_logger]-threading.py[line:870]-INFO: system ->init_wh_queue_output release end. 2025-05-29 15:38:05,641-[service.self_logger]-threading.py[line:870]-INFO: system ->trans_queue release end. 2025-05-29 15:38:05,641-[service.self_logger]-threading.py[line:870]-INFO: 5b2bfaf7-d875-4cd9-984e-4dd0763f3499 ->kill all process 2025-05-29 15:38:05,645-[service.self_logger]-threading.py[line:870]-INFO: 5b2bfaf7-d875-4cd9-984e-4dd0763f3499 ->all process killed and restart 2025-05-29 15:38:05,660-[service.self_logger]-process.py[line:108]-INFO: system ->init_wh_process start... 2025-05-29 15:38:05,680-[service.self_logger]-threading.py[line:870]-INFO: system ->get_audio_p[201] transfer_p[202] start 检测人脸使用GPU *************** EP Error *************** EP Error /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:121 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:114 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDA failure 35: CUDA driver version is insufficient for CUDA runtime version ; GPU=32521 ; hostname=63429f75fe5f ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider.cc ; line=245 ; expr=cudaSetDevice(info.device_id); when using ['CUDAExecutionProvider'] Falling back to ['CUDAExecutionProvider', 'CPUExecutionProvider'] and retrying. Process Process-10: Traceback (most recent call last): File "trans_dh_service.py", line 414, in trans_dh_service.TransDhTask.work File "/usr/local/python3/lib/python3.8/multiprocessing/queues.py", line 108, in get raise Empty _queue.Empty During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/python3/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 419, in init self._create_inference_session(providers, provider_options, disabled_optimizers) File "/usr/local/python3/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 483, in create_inference_session sess.initialize_session(providers, provider_options, disabled_optimizers) RuntimeError: /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:121 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:114 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDA failure 35: CUDA driver version is insufficient for CUDA runtime version ; GPU=32521 ; hostname=63429f75fe5f ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider.cc ; line=245 ; expr=cudaSetDevice(info.device_id); The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/local/python3/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/usr/local/python3/lib/python3.8/multiprocessing/process.py", line 108, in run self._target(_self._args, __self.kwargs) File "trans_dh_service.py", line 113, in trans_dh_service.init_wh_process File "face_detect.py", line 18, in face_detect.FaceDetect.init File "scrfd.py", line 90, in scrfd.SCRFD.init File "/usr/local/python3/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 432, in init raise fallback_error from e File "/usr/local/python3/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 427, in init self.create_inference_session(self.fallback_providers, None) File "/usr/local/python3/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 483, in create_inference_session sess.initialize_session(providers, provider_options, disabled_optimizers) RuntimeError: /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:121 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char, const char, ERRTYPE, const char, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:114 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDA failure 35: CUDA driver version is insufficient for CUDA runtime version ; GPU=0 ; hostname=63429f75fe5f ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider.cc ; line=245 ; expr=cudaSetDevice(info.device_id); Process Process-11: Traceback (most recent call last): File "trans_dh_service.py", line 414, in trans_dh_service.TransDhTask.work File "/usr/local/python3/lib/python3.8/multiprocessing/queues.py", line 108, in get raise Empty _queue.Empty During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/python3/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/usr/local/python3/lib/python3.8/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "trans_dh_service.py", line 55, in trans_dh_service.get_aud_feat1 File "compute_ctc_att_bnf.py", line 130, in compute_ctc_att_bnf.load_ppg_model File "/usr/local/python3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1152, in to return self._apply(convert) File "/usr/local/python3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 802, in _apply module._apply(fn) File "/usr/local/python3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 802, in _apply module._apply(fn) File "/usr/local/python3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 802, in _apply module._apply(fn) [Previous line repeated 1 more time] File "/usr/local/python3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 825, in _apply param_applied = fn(param) File "/usr/local/python3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1150, in convert return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) File "/usr/local/python3/lib/python3.8/site-packages/torch/cuda/init.py", line 302, in _lazy_init torch._C._cuda_init() RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx ----------------- Options --------------- aspect_ratio: 1.0 audio_feature: 3dmm batch_size: 16 checkpoints_dir: ./landmark2face_wy/checkpoints crop_size: 256 dataroot: ./data dataset_mode: Facereala3dmm direction: AtoB display_winsize: 256 distributed: False epoch: latest eval: False feat_num: 3 feature_path: ../AnnI_deep3dface_256_contains_id/ fp16: False gpu_ids: 0 img_size: 256 Process Process-12: init_gain: 0.02 Traceback (most recent call last): File "trans_dh_service.py", line 414, in trans_dh_service.TransDhTask.work File "/usr/local/python3/lib/python3.8/multiprocessing/queues.py", line 108, in get raise Empty _queue.Empty During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/python3/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/usr/local/python3/lib/python3.8/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "ai_service.py", line 105, in ai_service.av_transfer File "digitalhuman_interface.py", line 18, in digitalhuman_interface.DigitalHumanModel.init File "base_options.py", line 210, in base_options.BaseOptions.parse File "/usr/local/python3/lib/python3.8/site-packages/torch/cuda/init.py", line 408, in set_device torch._C._cuda_setDevice(device) File "/usr/local/python3/lib/python3.8/site-packages/torch/cuda/init.py", line 302, in _lazy_init torch._C._cuda_init() RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx init_type: normal input_nc: 3 instance_feat: False isTrain: False [default: None] label_feat: False lan_size: 1 load_features: False load_iter: 0 [default: 0] load_size: 286 local_rank: -1 max_dataset_size: inf mfcc0_rate: 0.2 model: pirender_3dmm_mouth_hd model_path: ./landmark2face_wy/checkpoints/anylang/dinet_v1_20240131.pth n_blocks: 9 n_blocks_global: 9 n_blocks_local: 3 n_clusters: 10 n_downsample_E: 4 n_downsample_global: 4 n_layers_D: 3 n_local_enhancers: 1 name: test ndf: 64 nef: 16 netD: basic netG: pirender ngf: 64 niter_fix_global: 0 no_dropout: True no_flip: False no_ganFeat_loss: False no_instance: False norm: instance ntest: inf num_D: 2 num_test: 50 num_threads: 4 output_nc: 3 perceptual_layers: ['relu_1_1', 'relu_2_1', 'relu_3_1', 'relu_4_1', 'relu_5_1'] perceptual_network: vgg19 perceptual_num_scales: 4 perceptual_use_style_loss: True perceptual_weights: [4, 4, 4, 4, 4] phase: test preprocess: resize_and_crop resize_size: 512 results_dir: ./results/ serial_batches: False suffix: test_audio_path: None test_muban: None verbose: False weight_style_to_perceptual: 250 ----------------- End ------------------- 172.18.0.1 - - [29/May/2025 15:38:06] "GET /easy/query?code=5b2bfaf7-d875-4cd9-984e-4dd0763f3499 HTTP/1.1" 200 - 172.18.0.1 - - [29/May/2025 15:38:08] "GET /easy/query?code=5b2bfaf7-d875-4cd9-984e-4dd0763f3499 HTTP/1.1" 200 - 172.18.0.1 - - [29/May/2025 15:38:10] "GET /easy/query?code=5b2bfaf7-d875-4cd9-984e-4dd0763f3499 HTTP/1.1" 200 - 172.18.0.1 - - [29/May/2025 15:38:12] "GET /easy/query?code=5b2bfaf7-d875-4cd9-984e-4dd0763f3499 HTTP/1.1" 200 - 172.18.0.1 - - [29/May/2025 15:38:14] "GET /easy/query?code=5b2bfaf7-d875-4cd9-984e-4dd0763f3499 HTTP/1.1" 200 - 2025-05-29 15:38:15,877-[service.self_logger]-threading.py[line:870]-INFO: 5b2bfaf7-d875-4cd9-984e-4dd0763f3499 -> 耗时:611.458s 172.18.0.1 - - [29/May/2025 15:38:16] "GET /easy/query?code=5b2bfaf7-d875-4cd9-984e-4dd0763f3499 HTTP/1.1" 200 -
Sorry for this problem! Judging from the error message, there is a problem with the service startup. Could you please take a screenshot of the command "nvidia-smi"?
nvidia-smi信息如下
ubuntu服务器信息如下 nvidia-container-toolkit Version: 1.17.7-1 NVIDIA-SMI 550.144.03 Driver Version: 550.144.03 CUDA Version: 12.4 onnxruntime-gpu 1.22.0 Cuda compilation tools, release 11.5 报错信息如下 172.18.0.1 - - [29/May/2025 15:37:56] "GET /easy/query?code=5b2bfaf7-d875-4cd9-984e-4dd0763f3499 HTTP/1.1" 200 - 172.18.0.1 - - [29/May/2025 15:37:58] "GET /easy/query?code=5b2bfaf7-d875-4cd9-984e-4dd0763f3499 HTTP/1.1" 200 - 172.18.0.1 - - [29/May/2025 15:38:00] "GET /easy/query?code=5b2bfaf7-d875-4cd9-984e-4dd0763f3499 HTTP/1.1" 200 - 172.18.0.1 - - [29/May/2025 15:38:02] "GET /easy/query?code=5b2bfaf7-d875-4cd9-984e-4dd0763f3499 HTTP/1.1" 200 - Traceback (most recent call last): File "trans_dh_service.py", line 414, in trans_dh_service.TransDhTask.work File "/usr/local/python3/lib/python3.8/multiprocessing/queues.py", line 108, in get raise Empty queue.Empty 2025-05-29 15:38:04,638-[service.self_logger]-threading.py[line:870]-ERROR: 5b2bfaf7-d875-4cd9-984e-4dd0763f3499 -> 任务执行失败,异常信息:[no final result] 2025-05-29 15:38:04,638-[service.self_logger]-threading.py[line:870]-INFO: system ->start to release queue... 172.18.0.1 - - [29/May/2025 15:38:04] "GET /easy/query?code=5b2bfaf7-d875-4cd9-984e-4dd0763f3499 HTTP/1.1" 200 - 2025-05-29 15:38:04,838-[service.self_logger]-threading.py[line:870]-INFO: system ->audio_feature_queue_output release end. 2025-05-29 15:38:05,039-[service.self_logger]-threading.py[line:870]-INFO: system ->output_queue release end. 2025-05-29 15:38:05,240-[service.self_logger]-threading.py[line:870]-INFO: system ->trans_queue release end. 2025-05-29 15:38:05,440-[service.self_logger]-threading.py[line:870]-INFO: system ->init_wh_queue release end. 2025-05-29 15:38:05,641-[service.self_logger]-threading.py[line:870]-INFO: system ->init_wh_queue_output release end. 2025-05-29 15:38:05,641-[service.self_logger]-threading.py[line:870]-INFO: system ->trans_queue release end. 2025-05-29 15:38:05,641-[service.self_logger]-threading.py[line:870]-INFO: 5b2bfaf7-d875-4cd9-984e-4dd0763f3499 ->kill all process 2025-05-29 15:38:05,645-[service.self_logger]-threading.py[line:870]-INFO: 5b2bfaf7-d875-4cd9-984e-4dd0763f3499 ->all process killed and restart 2025-05-29 15:38:05,660-[service.self_logger]-process.py[line:108]-INFO: system ->init_wh_process start... 2025-05-29 15:38:05,680-[service.self_logger]-threading.py[line:870]-INFO: system ->get_audio_p[201] transfer_p[202] start 检测人脸使用GPU *************** EP Error *************** EP Error /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:121 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:114 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDA failure 35: CUDA driver version is insufficient for CUDA runtime version ; GPU=32521 ; hostname=63429f75fe5f ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider.cc ; line=245 ; expr=cudaSetDevice(info.device_id); when using ['CUDAExecutionProvider'] Falling back to ['CUDAExecutionProvider', 'CPUExecutionProvider'] and retrying. Process Process-10: Traceback (most recent call last): File "trans_dh_service.py", line 414, in trans_dh_service.TransDhTask.work File "/usr/local/python3/lib/python3.8/multiprocessing/queues.py", line 108, in get raise Empty _queue.Empty During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/python3/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 419, in init self._create_inference_session(providers, provider_options, disabled_optimizers) File "/usr/local/python3/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 483, in create_inference_session sess.initialize_session(providers, provider_options, disabled_optimizers) RuntimeError: /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:121 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:114 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDA failure 35: CUDA driver version is insufficient for CUDA runtime version ; GPU=32521 ; hostname=63429f75fe5f ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider.cc ; line=245 ; expr=cudaSetDevice(info.device_id); The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/local/python3/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/usr/local/python3/lib/python3.8/multiprocessing/process.py", line 108, in run self._target(_self._args, __self.kwargs) File "trans_dh_service.py", line 113, in trans_dh_service.init_wh_process File "face_detect.py", line 18, in face_detect.FaceDetect.init File "scrfd.py", line 90, in scrfd.SCRFD.init File "/usr/local/python3/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 432, in init raise fallback_error from e File "/usr/local/python3/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 427, in init self.create_inference_session(self.fallback_providers, None) File "/usr/local/python3/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 483, in create_inference_session sess.initialize_session(providers, provider_options, disabled_optimizers) RuntimeError: /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:121 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char, const char, ERRTYPE, const char, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:114 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDA failure 35: CUDA driver version is insufficient for CUDA runtime version ; GPU=0 ; hostname=63429f75fe5f ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider.cc ; line=245 ; expr=cudaSetDevice(info.device_id); Process Process-11: Traceback (most recent call last): File "trans_dh_service.py", line 414, in trans_dh_service.TransDhTask.work File "/usr/local/python3/lib/python3.8/multiprocessing/queues.py", line 108, in get raise Empty _queue.Empty During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/python3/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/usr/local/python3/lib/python3.8/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "trans_dh_service.py", line 55, in trans_dh_service.get_aud_feat1 File "compute_ctc_att_bnf.py", line 130, in compute_ctc_att_bnf.load_ppg_model File "/usr/local/python3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1152, in to return self._apply(convert) File "/usr/local/python3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 802, in _apply module._apply(fn) File "/usr/local/python3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 802, in _apply module._apply(fn) File "/usr/local/python3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 802, in _apply module._apply(fn) [Previous line repeated 1 more time] File "/usr/local/python3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 825, in _apply param_applied = fn(param) File "/usr/local/python3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1150, in convert return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) File "/usr/local/python3/lib/python3.8/site-packages/torch/cuda/init.py", line 302, in _lazy_init torch._C._cuda_init() RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx ----------------- Options --------------- aspect_ratio: 1.0 audio_feature: 3dmm batch_size: 16 checkpoints_dir: ./landmark2face_wy/checkpoints crop_size: 256 dataroot: ./data dataset_mode: Facereala3dmm direction: AtoB display_winsize: 256 distributed: False epoch: latest eval: False feat_num: 3 feature_path: ../AnnI_deep3dface_256_contains_id/ fp16: False gpu_ids: 0 img_size: 256 Process Process-12: init_gain: 0.02 Traceback (most recent call last): File "trans_dh_service.py", line 414, in trans_dh_service.TransDhTask.work File "/usr/local/python3/lib/python3.8/multiprocessing/queues.py", line 108, in get raise Empty _queue.Empty During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/python3/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/usr/local/python3/lib/python3.8/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "ai_service.py", line 105, in ai_service.av_transfer File "digitalhuman_interface.py", line 18, in digitalhuman_interface.DigitalHumanModel.init File "base_options.py", line 210, in base_options.BaseOptions.parse File "/usr/local/python3/lib/python3.8/site-packages/torch/cuda/init.py", line 408, in set_device torch._C._cuda_setDevice(device) File "/usr/local/python3/lib/python3.8/site-packages/torch/cuda/init.py", line 302, in _lazy_init torch._C._cuda_init() RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx init_type: normal input_nc: 3 instance_feat: False isTrain: False [default: None] label_feat: False lan_size: 1 load_features: False load_iter: 0 [default: 0] load_size: 286 local_rank: -1 max_dataset_size: inf mfcc0_rate: 0.2 model: pirender_3dmm_mouth_hd model_path: ./landmark2face_wy/checkpoints/anylang/dinet_v1_20240131.pth n_blocks: 9 n_blocks_global: 9 n_blocks_local: 3 n_clusters: 10 n_downsample_E: 4 n_downsample_global: 4 n_layers_D: 3 n_local_enhancers: 1 name: test ndf: 64 nef: 16 netD: basic netG: pirender ngf: 64 niter_fix_global: 0 no_dropout: True no_flip: False no_ganFeat_loss: False no_instance: False norm: instance ntest: inf num_D: 2 num_test: 50 num_threads: 4 output_nc: 3 perceptual_layers: ['relu_1_1', 'relu_2_1', 'relu_3_1', 'relu_4_1', 'relu_5_1'] perceptual_network: vgg19 perceptual_num_scales: 4 perceptual_use_style_loss: True perceptual_weights: [4, 4, 4, 4, 4] phase: test preprocess: resize_and_crop resize_size: 512 results_dir: ./results/ serial_batches: False suffix: test_audio_path: None test_muban: None verbose: False weight_style_to_perceptual: 250 ----------------- End ------------------- 172.18.0.1 - - [29/May/2025 15:38:06] "GET /easy/query?code=5b2bfaf7-d875-4cd9-984e-4dd0763f3499 HTTP/1.1" 200 - 172.18.0.1 - - [29/May/2025 15:38:08] "GET /easy/query?code=5b2bfaf7-d875-4cd9-984e-4dd0763f3499 HTTP/1.1" 200 - 172.18.0.1 - - [29/May/2025 15:38:10] "GET /easy/query?code=5b2bfaf7-d875-4cd9-984e-4dd0763f3499 HTTP/1.1" 200 - 172.18.0.1 - - [29/May/2025 15:38:12] "GET /easy/query?code=5b2bfaf7-d875-4cd9-984e-4dd0763f3499 HTTP/1.1" 200 - 172.18.0.1 - - [29/May/2025 15:38:14] "GET /easy/query?code=5b2bfaf7-d875-4cd9-984e-4dd0763f3499 HTTP/1.1" 200 - 2025-05-29 15:38:15,877-[service.self_logger]-threading.py[line:870]-INFO: 5b2bfaf7-d875-4cd9-984e-4dd0763f3499 -> 耗时:611.458s 172.18.0.1 - - [29/May/2025 15:38:16] "GET /easy/query?code=5b2bfaf7-d875-4cd9-984e-4dd0763f3499 HTTP/1.1" 200 -
Sorry for this problem! Judging from the error message, there is a problem with the service startup. Could you please take a screenshot of the command "nvidia-smi"?
nvidia-smi信息如下
Judging from the log, it seems to be a problem with the graphics card driver. You can try upgrading to the latest graphics card driver. 手动驱动搜索)
@LegendaryM 您好。nvidia的显卡已经升级到最新
制作视频依然卡在5%,611秒后报错,报错信息如下 172.18.0.1 - - [29/May/2025 17:56:08] "GET /easy/query?code=66dfed3c-1752-4f23-92ff-f0a5cbc9c211 HTTP/1.1" 200 - Traceback (most recent call last): File "trans_dh_service.py", line 414, in trans_dh_service.TransDhTask.work File "/usr/local/python3/lib/python3.8/multiprocessing/queues.py", line 108, in get raise Empty queue.Empty 2025-05-29 17:56:09,164-[service.self_logger]-threading.py[line:870]-ERROR: 66dfed3c-1752-4f23-92ff-f0a5cbc9c211 -> 任务执行失败,异常信息:[no final result] 2025-05-29 17:56:09,164-[service.self_logger]-threading.py[line:870]-INFO: system ->start to release queue... 2025-05-29 17:56:09,365-[service.self_logger]-threading.py[line:870]-INFO: system ->audio_feature_queue_output release end. 2025-05-29 17:56:09,565-[service.self_logger]-threading.py[line:870]-INFO: system ->output_queue release end. 2025-05-29 17:56:09,766-[service.self_logger]-threading.py[line:870]-INFO: system ->trans_queue release end. 2025-05-29 17:56:09,966-[service.self_logger]-threading.py[line:870]-INFO: system ->init_wh_queue release end. 2025-05-29 17:56:10,167-[service.self_logger]-threading.py[line:870]-INFO: system ->init_wh_queue_output release end. 2025-05-29 17:56:10,167-[service.self_logger]-threading.py[line:870]-INFO: system ->trans_queue release end. 2025-05-29 17:56:10,167-[service.self_logger]-threading.py[line:870]-INFO: 66dfed3c-1752-4f23-92ff-f0a5cbc9c211 ->kill all process 2025-05-29 17:56:10,172-[service.self_logger]-threading.py[line:870]-INFO: 66dfed3c-1752-4f23-92ff-f0a5cbc9c211 ->all process killed and restart 2025-05-29 17:56:10,186-[service.self_logger]-process.py[line:108]-INFO: system ->init_wh_process start... 2025-05-29 17:56:10,200-[service.self_logger]-threading.py[line:870]-INFO: system ->get_audio_p[105] transfer_p[106] start 检测人脸使用GPU *************** EP Error *************** EP Error /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:121 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:114 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDA failure 35: CUDA driver version is insufficient for CUDA runtime version ; GPU=32528 ; hostname=63429f75fe5f ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider.cc ; line=245 ; expr=cudaSetDevice(info.device_id);
when using ['CUDAExecutionProvider'] Process Process-4: Traceback (most recent call last): File "trans_dh_service.py", line 414, in trans_dh_service.TransDhTask.work File "/usr/local/python3/lib/python3.8/multiprocessing/queues.py", line 108, in get raise Empty _queue.Empty
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/usr/local/python3/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 419, in init self._create_inference_session(providers, provider_options, disabled_optimizers) File "/usr/local/python3/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 483, in create_inference_session sess.initialize_session(providers, provider_options, disabled_optimizers) RuntimeError: /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:121 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:114 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDA failure 35: CUDA driver version is insufficient for CUDA runtime version ; GPU=32528 ; hostname=63429f75fe5f ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider.cc ; line=245 ; expr=cudaSetDevice(info.device_id);
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "/usr/local/python3/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/usr/local/python3/lib/python3.8/multiprocessing/process.py", line 108, in run self.target(self._args, self._kwargs) File "trans_dh_service.py", line 113, in trans_dh_service.init_wh_process File "face_detect.py", line 18, in face_detect.FaceDetect.init File "scrfd.py", line 90, in scrfd.SCRFD.init File "/usr/local/python3/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 432, in init raise fallback_error from e File "/usr/local/python3/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 427, in init self._create_inference_session(self._fallback_providers, None) File "/usr/local/python3/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 483, in _create_inference_session sess.initialize_session(providers, provider_options, disabled_optimizers) RuntimeError: /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:121 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char, const char, ERRTYPE, const char, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:114 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDA failure 35: CUDA driver version is insufficient for CUDA runtime version ; GPU=0 ; hostname=63429f75fe5f ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider.cc ; line=245 ; expr=cudaSetDevice(info.device_id);
Falling back to ['CUDAExecutionProvider', 'CPUExecutionProvider'] and retrying.
Process Process-5: Traceback (most recent call last): File "trans_dh_service.py", line 414, in trans_dh_service.TransDhTask.work File "/usr/local/python3/lib/python3.8/multiprocessing/queues.py", line 108, in get raise Empty _queue.Empty
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/python3/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/usr/local/python3/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "trans_dh_service.py", line 55, in trans_dh_service.get_aud_feat1
File "compute_ctc_att_bnf.py", line 130, in compute_ctc_att_bnf.load_ppg_model
File "/usr/local/python3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1152, in to
return self._apply(convert)
File "/usr/local/python3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 802, in _apply
module._apply(fn)
File "/usr/local/python3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 802, in _apply
module._apply(fn)
File "/usr/local/python3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 802, in _apply
module._apply(fn)
[Previous line repeated 1 more time]
File "/usr/local/python3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 825, in _apply
param_applied = fn(param)
File "/usr/local/python3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1150, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
File "/usr/local/python3/lib/python3.8/site-packages/torch/cuda/init.py", line 302, in _lazy_init
torch._C._cuda_init()
RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx
172.18.0.1 - - [29/May/2025 17:56:10] "GET /easy/query?code=66dfed3c-1752-4f23-92ff-f0a5cbc9c211 HTTP/1.1" 200 -
----------------- Options ---------------
aspect_ratio: 1.0
audio_feature: 3dmm
batch_size: 16
checkpoints_dir: ./landmark2face_wy/checkpoints
crop_size: 256
dataroot: ./data
dataset_mode: Facereala3dmm
direction: AtoB
display_winsize: 256
distributed: False
epoch: latest
eval: False
feat_num: 3
feature_path: ../AnnI_deep3dface_256_contains_id/
fp16: False
gpu_ids: 0
img_size: 256
init_gain: 0.02
init_type: normal
input_nc: 3
instance_feat: False
isTrain: False [default: None]
label_feat: False
lan_size: 1
load_features: False
load_iter: 0 [default: 0]
load_size: 286
local_rank: -1
max_dataset_size: inf
mfcc0_rate: 0.2
model: pirender_3dmm_mouth_hd
model_path: ./landmark2face_wy/checkpoints/anylang/dinet_v1_20240131.pth
n_blocks: 9
n_blocks_global: 9
n_blocks_local: 3
n_clusters: 10
n_downsample_E: 4
n_downsample_global: 4
n_layers_D: 3
n_local_enhancers: 1
name: test
ndf: 64
nef: 16
netD: basic
netG: pirender
ngf: 64
niter_fix_global: 0
no_dropout: True
no_flip: False
no_ganFeat_loss: False
no_instance: False
norm: instance
ntest: inf
num_D: 2
num_test: 50
num_threads: 4
output_nc: 3
perceptual_layers: ['relu_1_1', 'relu_2_1', 'relu_3_1', 'relu_4_1', 'relu_5_1']
perceptual_network: vgg19
perceptual_num_scales: 4
perceptual_use_style_loss: True
perceptual_weights: [4, 4, 4, 4, 4]
phase: test
preprocess: resize_and_crop
resize_size: 512
results_dir: ./results/
serial_batches: False
suffix:
test_audio_path: None
test_muban: None
verbose: False
weight_style_to_perceptual: 250
----------------- End -------------------
Process Process-6:
Traceback (most recent call last):
File "trans_dh_service.py", line 414, in trans_dh_service.TransDhTask.work
File "/usr/local/python3/lib/python3.8/multiprocessing/queues.py", line 108, in get
raise Empty
_queue.Empty
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/usr/local/python3/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/usr/local/python3/lib/python3.8/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "ai_service.py", line 105, in ai_service.av_transfer File "digitalhuman_interface.py", line 18, in digitalhuman_interface.DigitalHumanModel.init File "base_options.py", line 210, in base_options.BaseOptions.parse File "/usr/local/python3/lib/python3.8/site-packages/torch/cuda/init.py", line 408, in set_device torch._C._cuda_setDevice(device) File "/usr/local/python3/lib/python3.8/site-packages/torch/cuda/init.py", line 302, in _lazy_init torch._C._cuda_init() RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx 172.18.0.1 - - [29/May/2025 17:56:12] "GET /easy/query?code=66dfed3c-1752-4f23-92ff-f0a5cbc9c211 HTTP/1.1" 200 - 172.18.0.1 - - [29/May/2025 17:56:14] "GET /easy/query?code=66dfed3c-1752-4f23-92ff-f0a5cbc9c211 HTTP/1.1" 200 - 172.18.0.1 - - [29/May/2025 17:56:16] "GET /easy/query?code=66dfed3c-1752-4f23-92ff-f0a5cbc9c211 HTTP/1.1" 200 - 172.18.0.1 - - [29/May/2025 17:56:18] "GET /easy/query?code=66dfed3c-1752-4f23-92ff-f0a5cbc9c211 HTTP/1.1" 200 - 2025-05-29 17:56:20,341-[service.self_logger]-threading.py[line:870]-INFO: 66dfed3c-1752-4f23-92ff-f0a5cbc9c211 -> 耗时:611.521s 172.18.0.1 - - [29/May/2025 17:56:20] "GET /easy/query?code=66dfed3c-1752-4f23-92ff-f0a5cbc9c211 HTTP/1.1" 200 -
@thunder1218
Judging from the error prompt, it seems to have nothing to do with the model. You can execute the following command to verify the correctness of the graphics card and Docker.
docker run -itd --name test_gpu --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04
Then run docker ps command to check whether the test_gpu container has started
@LegendaryM test_gpu容器并没有启动成功,截图和错误日志如下,这代表了什么?另外我的服务器是租用的云主机。
@LegendaryM test_gpu容器并没有启动成功,截图和错误日志如下,这代表了什么?另外我的服务器是租用的云主机。
The image "nvidia/cuda:12.0.0-base-ubuntu22.04" is the basic version of the graphics card provided by nvidia on the docker official website. It is mainly used to verify whether there are any issues with the system version, graphics card version, and environment. Judging from the results of test_gpu, there might be some problems with the environment. Could you please provide screenshots of the following commands?
uname -a
cat /proc/version
@LegendaryM 我的服务器无法拉取镜像,镜像(包含heygem三个镜像)都是从本地mac计算机下载镜像然后上传到服务器的,根据提示“WARNING: The requested image's platform (linux/arm64) does not match the detected host platform (linux/amd64/v3) and no specific platform was requested”,我怀疑是镜像框架错误了,所以又从新拉取了镜像“docker pull --platform linux/amd64 nvidia/cuda:12.0.0-base-ubuntu22.04”,最后test_gpu容器运行成功了。
uname -a 和 cat /proc/version的结果如下
那么我遇到的问题是不是heygem镜像的架构和计算机的架构不匹配导致的呢?
@LegendaryM 我的服务器无法拉取镜像,镜像(包含heygem三个镜像)都是从本地mac计算机下载镜像然后上传到服务器的,根据提示“WARNING: The requested image's platform (linux/arm64) does not match the detected host platform (linux/amd64/v3) and no specific platform was requested”,我怀疑是镜像框架错误了,所以又从新拉取了镜像“docker pull --platform linux/amd64 nvidia/cuda:12.0.0-base-ubuntu22.04”,最后test_gpu容器运行成功了。
uname -a 和 cat /proc/version的结果如下
那么我遇到的问题是不是heygem镜像的架构和计算机的架构不匹配导致的呢?
It shouldn't be an architectural issue.As you can see from your screenshot, the heygem-gen-video service also started correctly, except that the video card used inside the container was incorrectly used. Could you enter the heygem-gen-video container and execute the nvidia-smi command to confirm whether the graphics card inside the container is normal?
@LegendaryM 在 heygem-gen-voideo 容器中执行 nvidia-smi 没有任何输出,另外两个容器可以正常输入显卡信息,屏幕截图如下。
@LegendaryM 在 heygem-gen-voideo 容器中执行 nvidia-smi 没有任何输出,另外两个容器可以正常输入显卡信息,屏幕截图如下。
Two questions:
-
Is the heygem-gen-video image pulled through docker-compose.yml or through docker-compose-5090.yml?
-
How to start heygem-gen-video? Could you send the startup command?
@LegendaryM在 heygem-gen-voideo 容器中执行 nvidia-smi 没有任何输出,另外两个容器可以正常输入显卡信息,屏幕截图如下。
两个问题:
- heygem-gen-video 图像是通过 docker-compose.yml 还是通过 docker-compose-5090.yml 提取的?
- 如何启动 heygem-gen-video?能发一下启动命令吗?
1、heygem-gen-video镜像,是在 windows 计算机通过下载、打包、上传、加载到服务器的。 docker pull guiji2025/heygem.ai docker save -o heygem-gen-video.tar guiji2025/heygem.ai docker load -i heygem-gen-video.tar 2、启动 heygem-gen-video是通过命令:docker-compose -f docker-compose-linux.yml up -d
@LegendaryM在 heygem-gen-voideo 容器中执行 nvidia-smi 没有任何输出,另外两个容器可以正常输入显卡信息,屏幕截图如下。
两个问题:
- heygem-gen-video 图像是通过 docker-compose.yml 还是通过 docker-compose-5090.yml 提取的?
- 如何启动 heygem-gen-video?能发一下启动命令吗?
1、heygem-gen-video镜像,是在 windows 计算机通过下载、打包、上传、加载到服务器的。 docker pull guiji2025/heygem.ai docker save -o heygem-gen-video.tar guiji2025/heygem.ai docker load -i heygem-gen-video.tar 2、启动 heygem-gen-video是通过命令:docker-compose -f docker-compose-linux.yml up -d
These all seem fine. Maybe it's necessary to connect to the machine to locate the problem.
@LegendaryM在 heygem-gen-voideo 容器中执行 nvidia-smi 没有任何输出,另外两个容器可以正常输入显卡信息,屏幕截图如下。
两个问题:
- heygem-gen-video 图像是通过 docker-compose.yml 还是通过 docker-compose-5090.yml 提取的?
- 如何启动 heygem-gen-video?能发一下启动命令吗?
1、heygem-gen-video镜像,是在 windows 计算机通过下载、打包、上传、加载到服务器的。 docker pull guiji2025/heygem.ai docker save -o heygem-gen-video.tar guiji2025/heygem.ai docker load -i heygem-gen-video.tar 2、启动 heygem-gen-video是通过命令:docker-compose -f docker-compose-linux.yml up -d
These all seem fine. Maybe it's necessary to connect to the machine to locate the problem.
是否可以我向你的邮箱发送服务器账号,请您帮忙登录服务器排查问题?
@thunder1218 Ok. This is more convenient
@thunder1218 Ok. This is more convenient
请问您的邮箱在哪里查看?
Have you joined the company's wechat group? Colleagues can contact me.
@thunder1218好的,这样更方便
请问您的邮箱在哪里查看?
发送邮件的时候,提示邮箱不存在。我的服务器账号是 117.50.27.58端口 22 账号 root 密码Yilieyun@517
![]()
OK
@thunder1218 I started a container named test_gpu using docker run. Try initiating a request.
@thunder1218 It should be a yml configuration issue of docker compose. This might need to be located later. You can use the test_gpu container first
服务器快要到期了。
Last weekend, I verified the configuration of heygem-gen-video in the docker-compose-linux.yml file on the official website and found it to be correct. Has the docker-compose-linux.yml file on your side been modified?
2025-06-06 10:10:38,930-[service.self_logger]-threading.py[line:870]-INFO: 71e6b361-29b1-47e1-bfc0-032dd8bbe56e ->audio_url:3fbe2484-799a-44f7-a805-1ce4d2a13e93.wav, video_url:20250605182805758.mp4, chaofen:0, pn:1 2025-06-06 10:10:39,123-[service.self_logger]-threading.py[line:870]-INFO: 71e6b361-29b1-47e1-bfc0-032dd8bbe56e -> video_info:[30.0/720/1280/9.666666666666666], audio_info:[10.63] 2025-06-06 10:20:39,222-[service.self_logger]-threading.py[line:870]-ERROR: 71e6b361-29b1-47e1-bfc0-032dd8bbe56e -> 任务执行失败,异常信息:[no final result] 2025-06-06 10:20:39,222-[service.self_logger]-threading.py[line:870]-INFO: system ->start to release queue... 2025-06-06 10:20:39,422-[service.self_logger]-threading.py[line:870]-INFO: system ->audio_feature_queue_output release end. 2025-06-06 10:20:39,623-[service.self_logger]-threading.py[line:870]-INFO: system ->output_queue release end. 2025-06-06 10:20:39,823-[service.self_logger]-threading.py[line:870]-INFO: system ->trans_queue release end. 2025-06-06 10:20:40,024-[service.self_logger]-threading.py[line:870]-INFO: system ->init_wh_queue release end. 2025-06-06 10:20:40,225-[service.self_logger]-threading.py[line:870]-INFO: system ->init_wh_queue_output release end. 2025-06-06 10:20:40,225-[service.self_logger]-threading.py[line:870]-INFO: system ->trans_queue release end. 2025-06-06 10:20:40,225-[service.self_logger]-threading.py[line:870]-INFO: 71e6b361-29b1-47e1-bfc0-032dd8bbe56e ->kill all process 2025-06-06 10:20:40,228-[service.self_logger]-threading.py[line:870]-INFO: 71e6b361-29b1-47e1-bfc0-032dd8bbe56e ->all process killed and restart 2025-06-06 10:20:40,239-[service.self_logger]-process.py[line:108]-INFO: system ->init_wh_process start... 2025-06-06 10:20:40,253-[service.self_logger]-threading.py[line:870]-INFO: system ->get_audio_p[145] transfer_p[146] start 2025-06-06 10:20:50,374-[service.self_logger]-threading.py[line:870]-INFO: 71e6b361-29b1-47e1-bfc0-032dd8bbe56e -> 耗时:611.352s
mian.log部分内容
2025-06-06 10:20:47 [debug] [SQL Run]: UPDATE video SET status = ?, message = ?, progress = ?, file_path = ? WHERE id = ? [ 'pending', '文件下载完成', 5, '', 4 ] 2025-06-06 10:20:49 [debug] ~ getVideoStatus ~ res: {"code":10000,"data":{"code":"71e6b361-29b1-47e1-bfc0-032dd8bbe56e","msg":"文件下载完成","progress":5,"result":"","status":1},"msg":"","success":true} 2025-06-06 10:20:49 [debug] [SQL Run]: UPDATE video SET status = ?, message = ?, progress = ?, file_path = ? WHERE id = ? [ 'pending', '文件下载完成', 5, '', 4 ] 2025-06-06 10:20:51 [debug] ~ getVideoStatus ~ res: {"code":10000,"data":{"code":"71e6b361-29b1-47e1-bfc0-032dd8bbe56e","msg":"","progress":0,"result":"","status":3},"msg":"","success":true} 2025-06-06 10:20:51 [debug] [SQL Run]: UPDATE video SET status = ?, message = ?, progress = ?, file_path = ? WHERE id = ? [ 'failed', '', 0, '', 4 ]
一样的,要不就是卡在5%,要不就是忙碌中,要不就是直接失败,要不就是找不到任务,看能不能先把Windows适配好。
My question is here, just add these two lines, and you can check if the configuration file is the same.
My question is here, just add these two lines, and you can check if the configuration file is the same.
it works. thank you !
@kenandaoer 还是卡在5%,这么改配置。我这里用的windows,是不是对应的就是docker-compose.yml文件改了就行?
好吧,重新拉取了一下heygem-gen-video,重启就好了