vcuda-controller icon indicating copy to clipboard operation
vcuda-controller copied to clipboard

need update for cuda11.4?

Open difenbei opened this issue 4 years ago • 3 comments

I tried to use vcuda on Driver Version: 470.57.02, the program may fail without warning. Does it need to be updated for cuda11.4?Thanks!

difenbei avatar Oct 20 '21 08:10 difenbei

Please provide the vcuda-controller log. About how to dump log, please see the FAQ of gpu-manager

mYmNeo avatar Dec 20 '21 09:12 mYmNeo

@mYmNeo hi, I followed the faq to set the env, but still not get the vcuda-controller log, should set the env in POD used gpu card?

hzliangbin avatar Oct 11 '22 03:10 hzliangbin

@difenbei ran into same problem, did u solve it? logs are below.

/tmp/cuda-control/src/loader.c:1102 config file: /etc/vcuda/7eadf10c1933050f72f33123c4013720907258d292e4695bbcc0732b2afa2405/vcuda.config /tmp/cuda-control/src/loader.c:1103 pid file: /etc/vcuda/7eadf10c1933050f72f33123c4013720907258d292e4695bbcc0732b2afa2405/pids.config /tmp/cuda-control/src/loader.c:1107 register to remote: pod uid: ad51fa3f-4b64-11ed-98e3-00163e144b97, cont id: 7eadf10c1933050f72f33123c4013720907258d292e4695bbcc0732b2afa2405 /tmp/cuda-control/src/loader.c:1205 pod uid : ad51fa3f-4b64-11ed-98e3-00163e144b97 /tmp/cuda-control/src/loader.c:1206 limit : 0 /tmp/cuda-control/src/loader.c:1207 container name : tensorflow-test /tmp/cuda-control/src/loader.c:1208 total utilization: 30 /tmp/cuda-control/src/loader.c:1209 total gpu memory : 4294967296 /tmp/cuda-control/src/loader.c:1210 driver version : 470.57.02 /tmp/cuda-control/src/loader.c:1211 hard limit mode : 1 /tmp/cuda-control/src/loader.c:1212 enable mode : 1 /tmp/cuda-control/src/loader.c:913 Start hijacking /tmp/cuda-control/src/loader.c:929 can't find function cuEGLInit in libcuda.so.470.57.02 /tmp/cuda-control/src/loader.c:876 can't find function nvmlDeviceGetBusType in libnvidia-ml.so.470.57.02 /tmp/cuda-control/src/loader.c:876 can't find function nvmlDeviceGetIrqNum in libnvidia-ml.so.470.57.02 /tmp/cuda-control/src/loader.c:876 can't find function nvmlVgpuInstanceGetLicenseInfo in libnvidia-ml.so.470.57.02 /tmp/cuda-control/src/loader.c:883 Hijacking nvmlInit

/tmp/cuda-control/src/hijack_call.c:466 cuInit error unknown error

but it was tested ok with driver version 460.32.03

hzliangbin avatar Oct 14 '22 06:10 hzliangbin

Did you reboot your machine after upgrading your driver?

mYmNeo avatar Oct 19 '22 01:10 mYmNeo

Did you reboot your machine after upgrading your driver?

thx,that‘s the point. After I reboot the machine, it works.

hzliangbin avatar Oct 19 '22 03:10 hzliangbin

#30

mYmNeo avatar Nov 15 '22 01:11 mYmNeo