gpu-manager
gpu-manager copied to clipboard
cant find library libcuda.so.xxx.xx.xx?
When I run 'nvidia-smi', I got the following information:
/tmp/cuda-control/src/loader.c:865 can't find library libcuda.so.418.87.01
What is the reason for this?
See #109 ?
@mYmNeo 我昨天更新了最新版本了,可是错误还在。109报错是856,而我报错是865,不知道这有什么区别。 另外我发现了一个奇怪的情况,我在pod里装了ssh并做了端口映射,然后从vscode访问pod调用显卡,就提示上述错误。 但从节点里exec直接进入这个pod,就一切正常。
@mYmNeo 我昨天更新了最新版本了,可是错误还在。109报错是856,而我报错是865,不知道这有什么区别。 另外我发现了一个奇怪的情况,我在pod里装了ssh并做了端口映射,然后从vscode访问pod调用显卡,就提示上述错误。 但从节点里exec直接进入这个pod,就一切正常。
SSHD will start a new session with a clean environment, it'll clear all environment variables, but kubectl exec run a command which inherits its parent environment variables
@mYmNeo Thank you very much. That's the reason. I checked the environment variables, and they are different. I wanted to add all the environment variables to the SSH environment, but I was worried about dynamic variables. Can you give me some advice?
@mYmNeo Thank you very much. That's the reason. I checked the environment variables, and they are different. I wanted to add all the environment variables to the SSH environment, but I was worried about dynamic variables. Can you give me some advice?
You need dump environment variables into a separate file, and use .bashrc or something like this to reload to your shell