gpu-manager icon indicating copy to clipboard operation
gpu-manager copied to clipboard

cant find library libcuda.so.xxx.xx.xx?

Open a378603 opened this issue 3 years ago • 5 comments

When I run 'nvidia-smi', I got the following information:

/tmp/cuda-control/src/loader.c:865 can't find library libcuda.so.418.87.01

What is the reason for this?

a378603 avatar Dec 21 '21 14:12 a378603

See #109 ?

mYmNeo avatar Dec 21 '21 23:12 mYmNeo

@mYmNeo 我昨天更新了最新版本了,可是错误还在。109报错是856,而我报错是865,不知道这有什么区别。 另外我发现了一个奇怪的情况,我在pod里装了ssh并做了端口映射,然后从vscode访问pod调用显卡,就提示上述错误。 但从节点里exec直接进入这个pod,就一切正常。

a378603 avatar Dec 23 '21 01:12 a378603

@mYmNeo 我昨天更新了最新版本了,可是错误还在。109报错是856,而我报错是865,不知道这有什么区别。 另外我发现了一个奇怪的情况,我在pod里装了ssh并做了端口映射,然后从vscode访问pod调用显卡,就提示上述错误。 但从节点里exec直接进入这个pod,就一切正常。

SSHD will start a new session with a clean environment, it'll clear all environment variables, but kubectl exec run a command which inherits its parent environment variables

mYmNeo avatar Dec 23 '21 06:12 mYmNeo

@mYmNeo Thank you very much. That's the reason. I checked the environment variables, and they are different. I wanted to add all the environment variables to the SSH environment, but I was worried about dynamic variables. Can you give me some advice?

a378603 avatar Dec 24 '21 02:12 a378603

@mYmNeo Thank you very much. That's the reason. I checked the environment variables, and they are different. I wanted to add all the environment variables to the SSH environment, but I was worried about dynamic variables. Can you give me some advice?

You need dump environment variables into a separate file, and use .bashrc or something like this to reload to your shell

mYmNeo avatar Dec 24 '21 02:12 mYmNeo