xen-orchestra Rolling pool update does not resume after reboot.

Are you using XOA or XO from the sources?

XO from the sources

Which release channel?

None

Provide your commit number

0794a

Describe the bug

When performing a "Rolling Update" on an HA cluster, XO proceeds to migrate all VM's off the primary node to other nodes. (good). The primary node then issues a reboot, however when it comes back on line the other nodes in the HA cluster do not resume downloading and applying patches.

Error message

Text From Settings > Logs:

server.enable
{
  "id": "0bce7468-93e5-4376-93c1-c75082f8f436"
}
{
  "name": "ConnectTimeoutError",
  "code": "UND_ERR_CONNECT_TIMEOUT",
  "call": {
    "method": "session.login_with_password",
    "params": "* obfuscated *"
  },
  "message": "Connect Timeout Error",
  "stack": "ConnectTimeoutError: Connect Timeout Error
    at onConnectTimeout (/opt/xen-orchestra/node_modules/undici/lib/core/connect.js:190:24)
    at /opt/xen-orchestra/node_modules/undici/lib/core/connect.js:133:46
    at Immediate._onImmediate (/opt/xen-orchestra/node_modules/undici/lib/core/connect.js:174:9)
    at processImmediate (node:internal/timers:476:21)
    at process.callbackTrampoline (node:internal/async_hooks:128:17)"
}

pool.rollingUpdate
{
  "pool": "62d8471c-e515-0d7a-d77f-5ac38a945507"
}
{
  "message": "Host 1f4b8cd7-e9da-414e-8558-8059a3165b98 took too long to restart",
  "name": "Error",
  "stack": "Error: Host 1f4b8cd7-e9da-414e-8558-8059a3165b98 took too long to restart
    at Xapi.rollingPoolReboot (file:///opt/xen-orchestra/packages/xo-server/src/xapi/mixins/pool.mjs:127:9)
    at Xapi.rollingPoolUpdate (file:///opt/xen-orchestra/packages/xo-server/src/xapi/mixins/patching.mjs:501:5)
    at XenServers.rollingPoolUpdate (file:///opt/xen-orchestra/packages/xo-server/src/xo-mixins/xen-servers.mjs:689:5)
    at Xo.rollingUpdate (file:///opt/xen-orchestra/packages/xo-server/src/api/pool.mjs:231:3)
    at Api.#callApiMethod (file:///opt/xen-orchestra/packages/xo-server/src/xo-mixins/api.mjs:366:20)"
}

To reproduce

Go to 'Home > Pools > Select HA Pool'
Click on 'Patches > Rolling pool Update'
See error (non displayed review logs)

Expected behavior

On reboot of the primary node, the migration of VM's back should resume and the process should go on to the next pool and repeat

Screenshots

No response

Node

18.20.0

Hypervisor

8.2.1

Additional context

It appears that the HA Master properly has VM's migrated and patches applied first. Systems all have 10GB dedicated storage and 1GB interface for VM access and management.

Apr 29 '24 22:04 tuxpowered

commit number 0794a

You are about a month behind on updates. Also, have you seen the latest revisions to the documentation where it explains how to increase the timeout period? https://xen-orchestra.com/docs/manage_infrastructure.html#rolling-pool-updates-rpu

Apr 29 '24 23:04 Danp2

Oh wow, that far behind already? Seems like it was just a few weeks ago I updated. Did not see the timeout update. I will update and review. It is odd because I have 2 clusters one updates fine np the other has an issue (just started testing the other cluter)

Apr 30 '24 00:04 tuxpowered

[Rolling Pool Update/Reboot] Use XO tasks for better reportability (PR #7578)

This was merged earlier today, which will make monitoring the RPU much easier.

Apr 30 '24 10:04 Danp2

We've recently made some changes to the RPU, including a fix for a bug introduced by the release earlier this month. Can you update to the latest version and test if the problem is still present? (and provide us with the XO task logs)

May 21 '24 09:05 b-Nollet

xen-orchestra xen-orchestra copied to clipboard

Rolling pool update does not resume after reboot.

Are you using XOA or XO from the sources?

Which release channel?

Provide your commit number

Describe the bug

Error message

To reproduce

Expected behavior

Screenshots

Node

Hypervisor

Additional context

xen-orchestra
xen-orchestra copied to clipboard