zos
zos copied to clipboard
client times out when deploying without giving any errors
While testing some scripts on my local zos node, I got the following error:
Error: Error: Deployment with contract_id: 7807 failed to be ready after 10 minutes
at TwinDeploymentHandler.handle (/home/abom/projects/jumpscale/grid3_client_ts/src/high_level/twinDeploymentHandler.ts:459:19)
at async MachinesModule.deploy (/home/abom/projects/jumpscale/grid3_client_ts/src/modules/machines.ts:77:27)
at async MachinesModule.descriptor.value (/home/abom/projects/jumpscale/grid3_client_ts/src/modules/utils.ts:14:16)
at async MachinesModule.descriptor.value (/home/abom/projects/jumpscale/grid3_client_ts/src/helpers/validator.ts:23:16)
at async main (/home/abom/projects/jumpscale/grid3_client_ts/scripts/vm_with_qsfs.ts:103:20)
After inspecting the node logs, I found some errors:
[+] provisiond: 2022-09-07T07:12:47Z debug incrmenting capacity +{CRU:0 SRU:10737418240 HRU:0 MRU:0 IPV4U:0} id=42-7807-wed2710d144 type=zmount
[+] provisiond: 2022-09-07T07:12:47Z debug provisioning deployment=7807 name=wed2710d33 twin=42 type=qsfs
[+] provisiond: 2022-09-07T07:12:47Z error failed to install workload error="failed to satisfy required capacity: cannot fulfil required memory size 1073741824 bytes out of usable 0 bytes" id=42-7807-wed2710d33
[+] provisiond: 2022-09-07T07:12:47Z debug provisioning deployment=7807 name=wed2710n44 twin=42 type=network
[+] provisiond: 2022-09-07T07:12:47Z debug provision network network="{NetworkIPRange:10.201.0.0/16 Subnet:10.201.2.0/24 WGPrivateKey:D3Luw0OtaeeZ98jKd1k5H57oeNErWCv9Ukbsvyq0g2Y= WGListenPort:3077 Peers:[]}"
...
...
...
[+] provisiond: 2022-09-07T06:19:37Z debug incrmenting capacity +{CRU:0 SRU:0 HRU:0 MRU:0 IPV4U:0} id=42-7795-wed2710n33 type=network
[+] provisiond: 2022-09-07T06:19:37Z debug provisioning deployment=7795 name=wed2710v33 twin=42 type=zmachine
[+] provisiond: 2022-09-07T06:19:37Z error failed to install workload error="failed to satisfy required capacity: cannot fulfil required memory size 563714457 bytes out of usable 0 bytes" id=42-7795-wed2710v33
[+] provisiond: 2022-09-07T06:19:37Z debug connecting url=wss://tfchain.dev.grid.tf/
[+] provisiond: 2022/09/07 06:19:37 Connecting to wss://tfchain.dev.grid.tf/...
I expected these to be returned to the grid client, but don't know for sure if the node itself send the result and if it was dropped at any layer (rmb, proxy or even the client itself).
This message is returned when the state of the workload didn't become ok
within the timeout period, also it's not turned to error
so there is no error message to be displayed.
I think it's not "timeout" the node already set the deployment to the correct "error" status. It's up to the client to make sure the deployment state has reached a final state (ok or error).
The client should not try to wait on a deployment that already has went into error state
Note that the node does not "return" a state. it basically goes like this
- client send deployment
- client "pull" on deployment object until it reaches a finalized state
- the final state can either be ok or error (for each workload int he deployment individually)
- A deployment can partially succeed for example (network, and disks are fine, but VM failed because not enough memory)
- A user can decide to delete everything and start over, OR update his deployment with right memory requirements.
I thought that the grid client does check the deployment status. I'll try to dump the deployment in case of a timeout.
After grid client timeout, here's the deployment
{
"version": 0,
"twin_id": 42,
"contract_id": 8955,
"metadata": "{'testVMs': true}",
"description": "test deploying VMs via ts grid3 client",
"expiration": 0,
"signature_requirement": {
"requests": [
{
"twin_id": 42,
"required": false,
"weight": 1
}
],
"weight_required": 1,
"signatures": [
{
"twin_id": 42,
"signature": "20ba1a0d48495da45e108628c2d18146efba54e40b04a4afdafc39398be15f2bbd22552fb06ec1d1d8a97cedd6fcac5c144f878cc527c78801f8ed8fbf7ec989",
"signature_type": "sr25519"
}
],
"signature_style": ""
},
"workloads": [
{
"version": 0,
"name": "wedtest",
"type": "network",
"data": {
"subnet": "10.249.2.0/24",
"ip_range": "10.249.0.0/16",
"wireguard_private_key": "SD11YIWgN5yBt0cwqRHgKC/XPdEOShyryO5Z9gmQD3w=",
"wireguard_listen_port": 2278,
"peers": [],
"node_id": 36
},
"metadata": "{'testVMs': true}",
"description": "test deploying VMs via ts grid3 client",
"result": {
"created": 1663224912,
"state": "ok",
"message": "",
"data": null
}
},
{
"version": 0,
"name": "wedDisk",
"type": "zmount",
"data": {
"size": 8589934592
},
"metadata": "{'testVMs': true}",
"description": "test deploying VMs via ts grid3 client",
"result": {
"created": 1663224912,
"state": "ok",
"message": "",
"data": {
"volume_id": "42-8955-wedDisk"
}
}
},
{
"version": 0,
"name": "testvm",
"type": "zmachine",
"data": {
"flist": "https://hub.grid.tf/tf-official-apps/threefoldtech-ubuntu-20.04.flist",
"network": {
"public_ip": "",
"interfaces": [
{
"network": "wedtest",
"ip": "10.249.2.2"
}
],
"planetary": true
},
"size": 0,
"compute_capacity": {
"cpu": 1,
"memory": 268435456
},
"mounts": [
{
"name": "wedDisk",
"mountpoint": "/testdisk"
}
],
"entrypoint": "/init.sh",
"env": {
"SSH_KEY": "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDYgwkuKTdfjwl22/987oY3k9yQXv92L3I7KfOgtgtT//EEPhH3PrY15KsIwznizwnufgnjdeeUsk4WchxoPQp3L9T8hE9D7JusGwm7kr/lli368kY7jpAi0vaD3ZdfmO+bZX+78j6q8Y3gQTCUp/pRb3wmY87lt9r+7uRPi3XyJnlBHMco67x+KtNwiFhrKgkHdf1sgYoj5iJ7ZzZpBvrRZ+HCBgHhDbm1hbJ2bCSzDpO4AUTOs8oY4ryAUL0kWHP7cXubieeDtZjnGFU8Y19ilXOldA/qon7BIl9LAA75roIJ1XFJvIVREJZowj5kNpZxSMJL8QYPaebE+JH3z81Bkm/BpCq4h2ITNLUEzYkBR2TQL6afKcT9JIII9qeN+Caoigw9ysnYu+vm1HM3sxRb8RiWrklSSR93B2Caopp13BUF1pRo9iBajJMNRIZf78+V7yNTZLUzADoW/Tte5S4tsbtcJk7jKWekEWiAwybpP5b7754V2AYk06eqnfNf19jrT8XHrdjTcg9MTCw7DCOLPtalcLuq5FG4FKKvtRo5mPAiHDu7a7UCbjKVBLvM642IcMbnUY2qcQGVQxzViQ6mAqhM+7RsPbcUt2YDSs4FNxm2jaxwZzvj/rvKkKGP08L4gMEhaH5Y6LpvtG2j3X6uOmf4ZNgNUrpccVe70oBPrw=="
},
"corex": false
},
"metadata": "{'testVMs': true}",
"description": "test deploying VMs via ts grid3 client",
"result": {
"created": 1663224910,
"state": "init",
"message": "",
"data": null
}
}
]
}
Node logs:
[+] provisiond: 2022-09-15T06:55:12Z debug incrmenting capacity +{CRU:0 SRU:0 HRU:0 MRU:0 IPV4U:0} id=42-8955-wedtest type=network
[+] provisiond: 2022-09-15T06:55:12Z debug provisioning deployment=8955 name=testvm twin=42 type=zmachine
[+] provisiond: 2022-09-15T06:55:12Z error failed to install workload error="failed to satisfy required capacity: cannot fulfil required memory size 281857228 bytes out of usable 0 bytes" id=42-8955-testvm
[+] provisiond: 2022/09/15 06:55:12 Connecting to wss://tfchain.dev.grid.tf/...
[+] provisiond: 2022-09-15T06:55:12Z debug connecting url=wss://tfchain.dev.grid.tf/
It looks like failure to create the vm did not reflect in the deployment status ? or you timed out too soon before the VM status was actually set.
Can u try to do Get again from the same node with the same contract id. See if the status is now "error". If not then that's a bug and you can start looking into code in that specific path and see what happens
Just tried to get deployment again
{
"version": 0,
"twin_id": 42,
"contract_id": 8955,
"metadata": "{'testVMs': true}",
"description": "test deploying VMs via ts grid3 client",
"expiration": 0,
"signature_requirement": {
"requests": [
{
"twin_id": 42,
"required": false,
"weight": 1
}
],
"weight_required": 1,
"signatures": [
{
"twin_id": 42,
"signature": "20ba1a0d48495da45e108628c2d18146efba54e40b04a4afdafc39398be15f2bbd22552fb06ec1d1d8a97cedd6fcac5c144f878cc527c78801f8ed8fbf7ec989",
"signature_type": "sr25519"
}
],
"signature_style": ""
},
"workloads": [
{
"version": 0,
"name": "testvm",
"type": "zmachine",
"data": {
"flist": "https://hub.grid.tf/tf-official-apps/threefoldtech-ubuntu-20.04.flist",
"network": {
"public_ip": "",
"interfaces": [
{
"network": "wedtest",
"ip": "10.249.2.2"
}
],
"planetary": true
},
"size": 0,
"compute_capacity": {
"cpu": 1,
"memory": 268435456
},
"mounts": [
{
"name": "wedDisk",
"mountpoint": "/testdisk"
}
],
"entrypoint": "/init.sh",
"env": {
"SSH_KEY": "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDYgwkuKTdfjwl22/987oY3k9yQXv92L3I7KfOgtgtT//EEPhH3PrY15KsIwznizwnufgnjdeeUsk4WchxoPQp3L9T8hE9D7JusGwm7kr/lli368kY7jpAi0vaD3ZdfmO+bZX+78j6q8Y3gQTCUp/pRb3wmY87lt9r+7uRPi3XyJnlBHMco67x+KtNwiFhrKgkHdf1sgYoj5iJ7ZzZpBvrRZ+HCBgHhDbm1hbJ2bCSzDpO4AUTOs8oY4ryAUL0kWHP7cXubieeDtZjnGFU8Y19ilXOldA/qon7BIl9LAA75roIJ1XFJvIVREJZowj5kNpZxSMJL8QYPaebE+JH3z81Bkm/BpCq4h2ITNLUEzYkBR2TQL6afKcT9JIII9qeN+Caoigw9ysnYu+vm1HM3sxRb8RiWrklSSR93B2Caopp13BUF1pRo9iBajJMNRIZf78+V7yNTZLUzADoW/Tte5S4tsbtcJk7jKWekEWiAwybpP5b7754V2AYk06eqnfNf19jrT8XHrdjTcg9MTCw7DCOLPtalcLuq5FG4FKKvtRo5mPAiHDu7a7UCbjKVBLvM642IcMbnUY2qcQGVQxzViQ6mAqhM+7RsPbcUt2YDSs4FNxm2jaxwZzvj/rvKkKGP08L4gMEhaH5Y6LpvtG2j3X6uOmf4ZNgNUrpccVe70oBPrw== [email protected]"
},
"corex": false
},
"metadata": "{'testVMs': true}",
"description": "test deploying VMs via ts grid3 client",
"result": {
"created": 1663224910,
"state": "init",
"message": "",
"data": null
}
},
{
"version": 0,
"name": "wedtest",
"type": "network",
"data": {
"subnet": "10.249.2.0/24",
"ip_range": "10.249.0.0/16",
"wireguard_private_key": "SD11YIWgN5yBt0cwqRHgKC/XPdEOShyryO5Z9gmQD3w=",
"wireguard_listen_port": 2278,
"peers": [],
"node_id": 36
},
"metadata": "{'testVMs': true}",
"description": "test deploying VMs via ts grid3 client",
"result": {
"created": 1663224912,
"state": "ok",
"message": "",
"data": null
}
},
{
"version": 0,
"name": "wedDisk",
"type": "zmount",
"data": {
"size": 8589934592
},
"metadata": "{'testVMs': true}",
"description": "test deploying VMs via ts grid3 client",
"result": {
"created": 1663224912,
"state": "ok",
"message": "",
"data": {
"volume_id": "42-8955-wedDisk"
}
}
}
]
}
The VM workload is still in init
status.
I think this one relates also to #1786
I pushed a fix to the issue to devnet. Give it couple of hours before retesting
Fixed on devnet