zos
zos copied to clipboard
Failed to prepare requested gpu device(s): GPU is disabled on this node.
it happens while deploying a vm.
env: mainnet node ID: 5727
the issue was originally reported here https://github.com/threefoldtech/tfgrid-sdk-ts/issues/2247
This error is only returned via the node if and only if the gpu was disabled explicitly by the farmer via the special disable-gpu kernel param
The thing is that the user never tried to attach a GPU to the deployed VM. We saw another example of this recently, and I was able to reproduce with these steps:
- Boot node with
disable-gpuset - Try to deploy a regular micro VM via the Dashboard
I think the issue is pretty clear in the code:
https://github.com/threefoldtech/zos/blob/9cd81f3ec8049a224b3c819e3b329a28835ae92c/pkg/primitives/vm/vm.go#L150-L154
The call to expandGPUs happens for every VM, even if config.GPU is empty.
And then:
https://github.com/threefoldtech/zos/blob/9cd81f3ec8049a224b3c819e3b329a28835ae92c/pkg/primitives/vm/gpu.go#L133-L135
Error is returned immediately if the kernel param is set.
So I think deployment to a node with disable-gpu is always gonna fail.
Error is returned immediately if the kernel param is set.
So I think deployment to a node with disable-gpu is always gonna fail.
nice catch 👍