gpu-manager cant find library libcuda.so.xxx.xx.xx?

When I run 'nvidia-smi', I got the following information:

/tmp/cuda-control/src/loader.c:865 can't find library libcuda.so.418.87.01

What is the reason for this?

Dec 21 '21 14:12 a378603

See #109 ?

Dec 21 '21 23:12 mYmNeo

@mYmNeo 我昨天更新了最新版本了，可是错误还在。109报错是856，而我报错是865，不知道这有什么区别。另外我发现了一个奇怪的情况，我在pod里装了ssh并做了端口映射，然后从vscode访问pod调用显卡，就提示上述错误。但从节点里exec直接进入这个pod，就一切正常。

Dec 23 '21 01:12 a378603

@mYmNeo 我昨天更新了最新版本了，可是错误还在。109报错是856，而我报错是865，不知道这有什么区别。另外我发现了一个奇怪的情况，我在pod里装了ssh并做了端口映射，然后从vscode访问pod调用显卡，就提示上述错误。但从节点里exec直接进入这个pod，就一切正常。

SSHD will start a new session with a clean environment, it'll clear all environment variables, but kubectl exec run a command which inherits its parent environment variables

Dec 23 '21 06:12 mYmNeo

@mYmNeo Thank you very much. That's the reason. I checked the environment variables, and they are different. I wanted to add all the environment variables to the SSH environment, but I was worried about dynamic variables. Can you give me some advice?

Dec 24 '21 02:12 a378603

@mYmNeo Thank you very much. That's the reason. I checked the environment variables, and they are different. I wanted to add all the environment variables to the SSH environment, but I was worried about dynamic variables. Can you give me some advice?

You need dump environment variables into a separate file, and use .bashrc or something like this to reload to your shell

Dec 24 '21 02:12 mYmNeo

gpu-manager gpu-manager copied to clipboard

cant find library libcuda.so.xxx.xx.xx?

gpu-manager
gpu-manager copied to clipboard