Shengbo Song

Results 54 comments of Shengbo Song

Please check if you have override the environment variable LD_LIBRARY_PATH

这个需要看你程序启动的脚本是不是覆盖了环境变量

There's no guaranteed, and gpu-manager will validate the allocation result

> @mYmNeo how gpu-manager validate the result, `preStartContainer`? > > I check the logic in `preStartContainer`, it gets pod uid in the checkpoint, and then get vcores and vmems from...

For gpu-manager, its allocation mechanism doesn't depend on the deviceID string, only the size of deviceID. So for your situation the pods have same `vcore` and `vmem` resource can be...

> @mYmNeo 我昨天更新了最新版本了,可是错误还在。109报错是856,而我报错是865,不知道这有什么区别。 > 另外我发现了一个奇怪的情况,我在pod里装了ssh并做了端口映射,然后从vscode访问pod调用显卡,就提示上述错误。 > 但从节点里exec直接进入这个pod,就一切正常。 SSHD will start a new session with a clean environment, it'll clear all environment variables, but kubectl exec run a command which inherits...

> @mYmNeo Thank you very much. That's the reason. I checked the environment variables, and they are different. I wanted to add all the environment variables to the SSH environment,...

1. Check LD_LIBRARY_PATH 2. Doesn't support nvidia-docker, only runc Please confirm the two questions

> > 1. Check LD_LIBRARY_PATH > > 2. Doesn't support nvidia-docker, only runc > > Please confirm the two questions > > 1. LD_LIBRARY_PATH内包含了libnvidia-ml.so所在路径,echo $LD_LIBRARY_PATH: > > ``` > /usr/local/nvidia/lib64:/usr/local/cuda/lib64/stubs:/usr/local/nvidia/lib...