zos client times out when deploying without giving any errors

While testing some scripts on my local zos node, I got the following error:

Error: Error: Deployment with contract_id: 7807 failed to be ready after 10 minutes 
    at TwinDeploymentHandler.handle (/home/abom/projects/jumpscale/grid3_client_ts/src/high_level/twinDeploymentHandler.ts:459:19)
    at async MachinesModule.deploy (/home/abom/projects/jumpscale/grid3_client_ts/src/modules/machines.ts:77:27)
    at async MachinesModule.descriptor.value (/home/abom/projects/jumpscale/grid3_client_ts/src/modules/utils.ts:14:16)
    at async MachinesModule.descriptor.value (/home/abom/projects/jumpscale/grid3_client_ts/src/helpers/validator.ts:23:16)
    at async main (/home/abom/projects/jumpscale/grid3_client_ts/scripts/vm_with_qsfs.ts:103:20)

After inspecting the node logs, I found some errors:

[+] provisiond: 2022-09-07T07:12:47Z debug incrmenting capacity +{CRU:0 SRU:10737418240 HRU:0 MRU:0 IPV4U:0} id=42-7807-wed2710d144 type=zmount
[+] provisiond: 2022-09-07T07:12:47Z debug provisioning deployment=7807 name=wed2710d33 twin=42 type=qsfs                      
[+] provisiond: 2022-09-07T07:12:47Z error failed to install workload error="failed to satisfy required capacity: cannot fulfil required memory size 1073741824 bytes out of usable 0 bytes" id=42-7807-wed2710d33
[+] provisiond: 2022-09-07T07:12:47Z debug provisioning deployment=7807 name=wed2710n44 twin=42 type=network                   
[+] provisiond: 2022-09-07T07:12:47Z debug provision network network="{NetworkIPRange:10.201.0.0/16 Subnet:10.201.2.0/24 WGPrivateKey:D3Luw0OtaeeZ98jKd1k5H57oeNErWCv9Ukbsvyq0g2Y= WGListenPort:3077 Peers:[]}"
...
...
...
[+] provisiond: 2022-09-07T06:19:37Z debug incrmenting capacity +{CRU:0 SRU:0 HRU:0 MRU:0 IPV4U:0} id=42-7795-wed2710n33 type=network                                                                          
[+] provisiond: 2022-09-07T06:19:37Z debug provisioning deployment=7795 name=wed2710v33 twin=42 type=zmachine                        
[+] provisiond: 2022-09-07T06:19:37Z error failed to install workload error="failed to satisfy required capacity: cannot fulfil required memory size 563714457 bytes out of usable 0 bytes" id=42-7795-wed2710v33
[+] provisiond: 2022-09-07T06:19:37Z debug connecting url=wss://tfchain.dev.grid.tf/                                                                                                                             
[+] provisiond: 2022/09/07 06:19:37 Connecting to wss://tfchain.dev.grid.tf/...

I expected these to be returned to the grid client, but don't know for sure if the node itself send the result and if it was dropped at any layer (rmb, proxy or even the client itself).

Sep 07 '22 08:09 abom

This message is returned when the state of the workload didn't become ok within the timeout period, also it's not turned to error so there is no error message to be displayed.

Sep 07 '22 08:09 AhmedHanafy725

I think it's not "timeout" the node already set the deployment to the correct "error" status. It's up to the client to make sure the deployment state has reached a final state (ok or error).

The client should not try to wait on a deployment that already has went into error state

Sep 15 '22 06:09 muhamadazmy

Note that the node does not "return" a state. it basically goes like this

client send deployment
client "pull" on deployment object until it reaches a finalized state
the final state can either be ok or error (for each workload int he deployment individually)
- A deployment can partially succeed for example (network, and disks are fine, but VM failed because not enough memory)
- A user can decide to delete everything and start over, OR update his deployment with right memory requirements.

Sep 15 '22 06:09 muhamadazmy

I thought that the grid client does check the deployment status. I'll try to dump the deployment in case of a timeout.

Sep 15 '22 07:09 abom

After grid client timeout, here's the deployment

{
  "version": 0,
  "twin_id": 42,
  "contract_id": 8955,
  "metadata": "{'testVMs': true}",
  "description": "test deploying VMs via ts grid3 client",
  "expiration": 0,
  "signature_requirement": {
    "requests": [
      {
        "twin_id": 42,
        "required": false,
        "weight": 1
      }
    ],
    "weight_required": 1,
    "signatures": [
      {
        "twin_id": 42,
        "signature": "20ba1a0d48495da45e108628c2d18146efba54e40b04a4afdafc39398be15f2bbd22552fb06ec1d1d8a97cedd6fcac5c144f878cc527c78801f8ed8fbf7ec989",
        "signature_type": "sr25519"
      }
    ],
    "signature_style": ""
  },
  "workloads": [
    {
      "version": 0,
      "name": "wedtest",
      "type": "network",
      "data": {
        "subnet": "10.249.2.0/24",
        "ip_range": "10.249.0.0/16",
        "wireguard_private_key": "SD11YIWgN5yBt0cwqRHgKC/XPdEOShyryO5Z9gmQD3w=",
        "wireguard_listen_port": 2278,
        "peers": [],
        "node_id": 36
      },
      "metadata": "{'testVMs': true}",
      "description": "test deploying VMs via ts grid3 client",
      "result": {
        "created": 1663224912,
        "state": "ok",
        "message": "",
        "data": null
      }
    },
    {
      "version": 0,
      "name": "wedDisk",
      "type": "zmount",
      "data": {
        "size": 8589934592
      },
      "metadata": "{'testVMs': true}",
      "description": "test deploying VMs via ts grid3 client",
      "result": {
        "created": 1663224912,
        "state": "ok",
        "message": "",
        "data": {
          "volume_id": "42-8955-wedDisk"
        }
      }
    },
    {
      "version": 0,
      "name": "testvm",
      "type": "zmachine",
      "data": {
        "flist": "https://hub.grid.tf/tf-official-apps/threefoldtech-ubuntu-20.04.flist",
        "network": {
          "public_ip": "",
          "interfaces": [
            {
              "network": "wedtest",
              "ip": "10.249.2.2"
            }
          ],
          "planetary": true
        },
        "size": 0,
        "compute_capacity": {
          "cpu": 1,
          "memory": 268435456
        },
        "mounts": [
          {
            "name": "wedDisk",
            "mountpoint": "/testdisk"
          }
        ],
        "entrypoint": "/init.sh",
        "env": {
          "SSH_KEY": "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDYgwkuKTdfjwl22/987oY3k9yQXv92L3I7KfOgtgtT//EEPhH3PrY15KsIwznizwnufgnjdeeUsk4WchxoPQp3L9T8hE9D7JusGwm7kr/lli368kY7jpAi0vaD3ZdfmO+bZX+78j6q8Y3gQTCUp/pRb3wmY87lt9r+7uRPi3XyJnlBHMco67x+KtNwiFhrKgkHdf1sgYoj5iJ7ZzZpBvrRZ+HCBgHhDbm1hbJ2bCSzDpO4AUTOs8oY4ryAUL0kWHP7cXubieeDtZjnGFU8Y19ilXOldA/qon7BIl9LAA75roIJ1XFJvIVREJZowj5kNpZxSMJL8QYPaebE+JH3z81Bkm/BpCq4h2ITNLUEzYkBR2TQL6afKcT9JIII9qeN+Caoigw9ysnYu+vm1HM3sxRb8RiWrklSSR93B2Caopp13BUF1pRo9iBajJMNRIZf78+V7yNTZLUzADoW/Tte5S4tsbtcJk7jKWekEWiAwybpP5b7754V2AYk06eqnfNf19jrT8XHrdjTcg9MTCw7DCOLPtalcLuq5FG4FKKvtRo5mPAiHDu7a7UCbjKVBLvM642IcMbnUY2qcQGVQxzViQ6mAqhM+7RsPbcUt2YDSs4FNxm2jaxwZzvj/rvKkKGP08L4gMEhaH5Y6LpvtG2j3X6uOmf4ZNgNUrpccVe70oBPrw=="
        },
        "corex": false
      },
      "metadata": "{'testVMs': true}",
      "description": "test deploying VMs via ts grid3 client",
      "result": {
        "created": 1663224910,
        "state": "init",
        "message": "",
        "data": null
      }
    }
  ]
}

Node logs:

[+] provisiond: 2022-09-15T06:55:12Z debug incrmenting capacity +{CRU:0 SRU:0 HRU:0 MRU:0 IPV4U:0} id=42-8955-wedtest type=network
[+] provisiond: 2022-09-15T06:55:12Z debug provisioning deployment=8955 name=testvm twin=42 type=zmachine
[+] provisiond: 2022-09-15T06:55:12Z error failed to install workload error="failed to satisfy required capacity: cannot fulfil required memory size 281857228 bytes out of usable 0 bytes" id=42-8955-testvm
[+] provisiond: 2022/09/15 06:55:12 Connecting to wss://tfchain.dev.grid.tf/...
[+] provisiond: 2022-09-15T06:55:12Z debug connecting url=wss://tfchain.dev.grid.tf/

Sep 15 '22 08:09 abom

It looks like failure to create the vm did not reflect in the deployment status ? or you timed out too soon before the VM status was actually set.

Can u try to do Get again from the same node with the same contract id. See if the status is now "error". If not then that's a bug and you can start looking into code in that specific path and see what happens

Sep 15 '22 10:09 muhamadazmy

Just tried to get deployment again

{
  "version": 0,
  "twin_id": 42,
  "contract_id": 8955,
  "metadata": "{'testVMs': true}",
  "description": "test deploying VMs via ts grid3 client",
  "expiration": 0,
  "signature_requirement": {
    "requests": [
      {
        "twin_id": 42,
        "required": false,
        "weight": 1
      }
    ],
    "weight_required": 1,
    "signatures": [
      {
        "twin_id": 42,
        "signature": "20ba1a0d48495da45e108628c2d18146efba54e40b04a4afdafc39398be15f2bbd22552fb06ec1d1d8a97cedd6fcac5c144f878cc527c78801f8ed8fbf7ec989",
        "signature_type": "sr25519"
      }
    ],
    "signature_style": ""
  },
  "workloads": [
    {
      "version": 0,
      "name": "testvm",
      "type": "zmachine",
      "data": {
        "flist": "https://hub.grid.tf/tf-official-apps/threefoldtech-ubuntu-20.04.flist",
        "network": {
          "public_ip": "",
          "interfaces": [
            {
              "network": "wedtest",
              "ip": "10.249.2.2"
            }
          ],
          "planetary": true
        },
        "size": 0,
        "compute_capacity": {
          "cpu": 1,
          "memory": 268435456
        },
        "mounts": [
          {
            "name": "wedDisk",
            "mountpoint": "/testdisk"
          }
        ],
        "entrypoint": "/init.sh",
        "env": {
          "SSH_KEY": "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDYgwkuKTdfjwl22/987oY3k9yQXv92L3I7KfOgtgtT//EEPhH3PrY15KsIwznizwnufgnjdeeUsk4WchxoPQp3L9T8hE9D7JusGwm7kr/lli368kY7jpAi0vaD3ZdfmO+bZX+78j6q8Y3gQTCUp/pRb3wmY87lt9r+7uRPi3XyJnlBHMco67x+KtNwiFhrKgkHdf1sgYoj5iJ7ZzZpBvrRZ+HCBgHhDbm1hbJ2bCSzDpO4AUTOs8oY4ryAUL0kWHP7cXubieeDtZjnGFU8Y19ilXOldA/qon7BIl9LAA75roIJ1XFJvIVREJZowj5kNpZxSMJL8QYPaebE+JH3z81Bkm/BpCq4h2ITNLUEzYkBR2TQL6afKcT9JIII9qeN+Caoigw9ysnYu+vm1HM3sxRb8RiWrklSSR93B2Caopp13BUF1pRo9iBajJMNRIZf78+V7yNTZLUzADoW/Tte5S4tsbtcJk7jKWekEWiAwybpP5b7754V2AYk06eqnfNf19jrT8XHrdjTcg9MTCw7DCOLPtalcLuq5FG4FKKvtRo5mPAiHDu7a7UCbjKVBLvM642IcMbnUY2qcQGVQxzViQ6mAqhM+7RsPbcUt2YDSs4FNxm2jaxwZzvj/rvKkKGP08L4gMEhaH5Y6LpvtG2j3X6uOmf4ZNgNUrpccVe70oBPrw== [email protected]"
        },
        "corex": false
      },
      "metadata": "{'testVMs': true}",
      "description": "test deploying VMs via ts grid3 client",
      "result": {
        "created": 1663224910,
        "state": "init",
        "message": "",
        "data": null
      }
    },
    {
      "version": 0,
      "name": "wedtest",
      "type": "network",
      "data": {
        "subnet": "10.249.2.0/24",
        "ip_range": "10.249.0.0/16",
        "wireguard_private_key": "SD11YIWgN5yBt0cwqRHgKC/XPdEOShyryO5Z9gmQD3w=",
        "wireguard_listen_port": 2278,
        "peers": [],
        "node_id": 36
      },
      "metadata": "{'testVMs': true}",
      "description": "test deploying VMs via ts grid3 client",
      "result": {
        "created": 1663224912,
        "state": "ok",
        "message": "",
        "data": null
      }
    },
    {
      "version": 0,
      "name": "wedDisk",
      "type": "zmount",
      "data": {
        "size": 8589934592
      },
      "metadata": "{'testVMs': true}",
      "description": "test deploying VMs via ts grid3 client",
      "result": {
        "created": 1663224912,
        "state": "ok",
        "message": "",
        "data": {
          "volume_id": "42-8955-wedDisk"
        }
      }
    }
  ]
}

The VM workload is still in init status.

Sep 15 '22 10:09 abom

I think this one relates also to #1786

Sep 19 '22 09:09 muhamadazmy

I pushed a fix to the issue to devnet. Give it couple of hours before retesting

Sep 19 '22 10:09 muhamadazmy

Fixed on devnet

Nov 15 '22 08:11 muhamadazmy

zos zos copied to clipboard

client times out when deploying without giving any errors

zos
zos copied to clipboard