elastic-gpu-agent icon indicating copy to clipboard operation
elastic-gpu-agent copied to clipboard

elastic-gpu-agent 启动失败

Open caiyuanji opened this issue 1 year ago • 0 comments

kubectl -n kube-system logs elastic-gpu-agent-vjx7h Defaulted container "elastic-gpu-agent" out of: elastic-gpu-agent, elastic-gpu-installer (init) panic: error opening libnvidia-ml.so.1: libnvidia-ml.so.1: cannot open shared object file: No such file or directory

goroutine 1 [running]: github.com/NVIDIA/go-nvml/pkg/nvml.Init() /go/src/elastic-gpu-agent/vendor/github.com/NVIDIA/go-nvml/pkg/nvml/init.go:41 +0x109 elasticgpu.io/elastic-gpu-agent/pkg/operator.(*baseOperator).devices(0xc00007e250, 0xc000195c30) /go/src/elastic-gpu-agent/pkg/operator/base.go:48 +0x50 elasticgpu.io/elastic-gpu-agent/pkg/operator.(*baseOperator).Devices(0x0) /go/src/elastic-gpu-agent/pkg/operator/base.go:20 +0x66 elasticgpu.io/elastic-gpu-agent/pkg/plugins.NewGPUShareMemoryDevicePlugin(0xc0003c2600) /go/src/elastic-gpu-agent/pkg/plugins/gpushare.go:155 +0x3f elasticgpu.io/elastic-gpu-agent/pkg/plugins.NewGPUSharePlugin(0xc0003c2600) /go/src/elastic-gpu-agent/pkg/plugins/base.go:210 +0x7a elasticgpu.io/elastic-gpu-agent/pkg/plugins.PluginFactory(0xc0003c2600) /go/src/elastic-gpu-agent/pkg/plugins/base.go:59 +0x1c5 elasticgpu.io/elastic-gpu-agent/pkg/manager.NewGPUManager({0xc000195f48, 0x4, 0x289de00}) /go/src/elastic-gpu-agent/pkg/manager/manager.go:138 +0x2dd main.main() /go/src/elastic-gpu-agent/cmd/main.go:26 +0x399

caiyuanji avatar Sep 12 '24 05:09 caiyuanji