xen-orchestra
xen-orchestra copied to clipboard
Rolling pool update does not resume after reboot.
Are you using XOA or XO from the sources?
XO from the sources
Which release channel?
None
Provide your commit number
0794a
Describe the bug
When performing a "Rolling Update" on an HA cluster, XO proceeds to migrate all VM's off the primary node to other nodes. (good). The primary node then issues a reboot, however when it comes back on line the other nodes in the HA cluster do not resume downloading and applying patches.
Error message
Text From Settings > Logs:
server.enable
{
"id": "0bce7468-93e5-4376-93c1-c75082f8f436"
}
{
"name": "ConnectTimeoutError",
"code": "UND_ERR_CONNECT_TIMEOUT",
"call": {
"method": "session.login_with_password",
"params": "* obfuscated *"
},
"message": "Connect Timeout Error",
"stack": "ConnectTimeoutError: Connect Timeout Error
at onConnectTimeout (/opt/xen-orchestra/node_modules/undici/lib/core/connect.js:190:24)
at /opt/xen-orchestra/node_modules/undici/lib/core/connect.js:133:46
at Immediate._onImmediate (/opt/xen-orchestra/node_modules/undici/lib/core/connect.js:174:9)
at processImmediate (node:internal/timers:476:21)
at process.callbackTrampoline (node:internal/async_hooks:128:17)"
}
pool.rollingUpdate
{
"pool": "62d8471c-e515-0d7a-d77f-5ac38a945507"
}
{
"message": "Host 1f4b8cd7-e9da-414e-8558-8059a3165b98 took too long to restart",
"name": "Error",
"stack": "Error: Host 1f4b8cd7-e9da-414e-8558-8059a3165b98 took too long to restart
at Xapi.rollingPoolReboot (file:///opt/xen-orchestra/packages/xo-server/src/xapi/mixins/pool.mjs:127:9)
at Xapi.rollingPoolUpdate (file:///opt/xen-orchestra/packages/xo-server/src/xapi/mixins/patching.mjs:501:5)
at XenServers.rollingPoolUpdate (file:///opt/xen-orchestra/packages/xo-server/src/xo-mixins/xen-servers.mjs:689:5)
at Xo.rollingUpdate (file:///opt/xen-orchestra/packages/xo-server/src/api/pool.mjs:231:3)
at Api.#callApiMethod (file:///opt/xen-orchestra/packages/xo-server/src/xo-mixins/api.mjs:366:20)"
}
To reproduce
- Go to 'Home > Pools > Select HA Pool'
- Click on 'Patches > Rolling pool Update'
- See error (non displayed review logs)
Expected behavior
On reboot of the primary node, the migration of VM's back should resume and the process should go on to the next pool and repeat
Screenshots
No response
Node
18.20.0
Hypervisor
8.2.1
Additional context
It appears that the HA Master properly has VM's migrated and patches applied first. Systems all have 10GB dedicated storage and 1GB interface for VM access and management.
commit number 0794a
You are about a month behind on updates. Also, have you seen the latest revisions to the documentation where it explains how to increase the timeout period? https://xen-orchestra.com/docs/manage_infrastructure.html#rolling-pool-updates-rpu
Oh wow, that far behind already? Seems like it was just a few weeks ago I updated. Did not see the timeout update. I will update and review. It is odd because I have 2 clusters one updates fine np the other has an issue (just started testing the other cluter)
[Rolling Pool Update/Reboot] Use XO tasks for better reportability (PR #7578)
This was merged earlier today, which will make monitoring the RPU much easier.
We've recently made some changes to the RPU, including a fix for a bug introduced by the release earlier this month. Can you update to the latest version and test if the problem is still present? (and provide us with the XO task logs)