zos icon indicating copy to clipboard operation
zos copied to clipboard

Failed to prepare requested gpu device(s): GPU is disabled on this node.

Open AhmedHanafy725 opened this issue 1 year ago • 3 comments

it happens while deploying a vm.

env: mainnet node ID: 5727

the issue was originally reported here https://github.com/threefoldtech/tfgrid-sdk-ts/issues/2247

AhmedHanafy725 avatar Mar 06 '24 14:03 AhmedHanafy725

This error is only returned via the node if and only if the gpu was disabled explicitly by the farmer via the special disable-gpu kernel param

muhamadazmy avatar Mar 18 '24 12:03 muhamadazmy

The thing is that the user never tried to attach a GPU to the deployed VM. We saw another example of this recently, and I was able to reproduce with these steps:

  1. Boot node with disable-gpu set
  2. Try to deploy a regular micro VM via the Dashboard

I think the issue is pretty clear in the code:

https://github.com/threefoldtech/zos/blob/9cd81f3ec8049a224b3c819e3b329a28835ae92c/pkg/primitives/vm/vm.go#L150-L154

The call to expandGPUs happens for every VM, even if config.GPU is empty.

And then:

https://github.com/threefoldtech/zos/blob/9cd81f3ec8049a224b3c819e3b329a28835ae92c/pkg/primitives/vm/gpu.go#L133-L135

Error is returned immediately if the kernel param is set.

So I think deployment to a node with disable-gpu is always gonna fail.

scottyeager avatar Aug 21 '24 20:08 scottyeager

Error is returned immediately if the kernel param is set.

So I think deployment to a node with disable-gpu is always gonna fail.

nice catch 👍

iwanbk avatar Aug 22 '24 14:08 iwanbk