xen-orchestra icon indicating copy to clipboard operation
xen-orchestra copied to clipboard

XO no longer reconnects automatically to XCP server after a failed connection (timeout)

Open bogdantomasciuc opened this issue 3 years ago • 11 comments

XOA or XO from the sources? XO commit 379e4

If XO from the sources:

Describe the bug XO no longer tries to reconnect to XCP server after a failed connection (timeout)

To Reproduce Steps to reproduce the behavior:

  1. Go to Settings -> Servers
  2. Add a new XCP server
  3. Block access to said server through a firewall rule or unplug cable for more than 5 minutes
  4. See connection is marked with an error icon in Settings -> Servers
  5. Reconnect cable / disable blocking rule
  6. See connection to server still marked as unavailable. I left it like that for hours and it does not revert to available.

Expected behavior It used to be that the connection was retried every 1 minute but now it doesn't seem to work like that any more. I caught this for quite some time - at least 3 months but I thought it was due to my setup. I have reinstalled XO since and it behaves in the same way.

Screenshots Connection marked as failed even though the server is available now:
FailedServerConnection

Error message: FailedServerConnectionErr

Proof server is available: FailedServerConnectionActuallyAvailable

Environment (please provide the following information):

  • Node: [e.g. 16.14.2]
  • xo-server 5.95.0
  • xo-web 5.97.1
  • hypervisor: the latest server I had the issue with is XCP-Ng 7.6.0 but it also came up with 8.2 hosts

Other information If I click on the "Enabled" button to disable the connection and then click on the button again to enable the connection everything starts working again but it should start working automatically without someone recycling the connection manually. Also restarting XO vm or the related services have the same effect.

If you reached this line: Thank you! :)

bogdantomasciuc avatar Jun 03 '22 08:06 bogdantomasciuc

Thank you for your detailed report, we'll investigate :)

julien-f avatar Jun 03 '22 08:06 julien-f

Thanx!

bogdantomasciuc avatar Jun 03 '22 08:06 bogdantomasciuc

I cannot reproduce on my side.

I've used sudo iptables -I OUTPUT -d <address> -j DROP to block access to my host and XO correctly detects the disconnection and remove the objects from the UI.

Then I removed the rule (sudo iptables -D OUTPUT -d <address> -j DROP), and after a few minutes, the objects reappeared in the UI.

julien-f avatar Jun 03 '22 09:06 julien-f

That is curious. I will make more tests and come back with the results.

bogdantomasciuc avatar Jun 03 '22 10:06 bogdantomasciuc

Ok I managed to replicate it again. Do sudo iptables -I OUTPUT -d <address> -j DROP then go to Settings page and cycle the button Enable/Disable. When you enable it it will try to connect for some time. You will see the spinner animating. Leave it like that for a few minutes until you can see the attention icon below and if you click it you see "connect ETIMEDOUT [...]". After delete the blocking rule and leave it alone. It will not reconnect by itself.

During the night nobody recycles the connection when the link is down but somehow we reach the same result. This is just a way to mimic the problem.

bogdantomasciuc avatar Jun 03 '22 12:06 bogdantomasciuc

It's possible that if XO cannot connect when enabling it, it will not keep retrying.

It will only retry if the connection is lost, not when the host is not available initially.

julien-f avatar Sep 02 '22 09:09 julien-f

I had this issue happen again. The VPN tunnel went down and even though the tunnel reconnected at some point the connection stayed down over the weekend. We disabled/enabled the connection manually on Monday to reconnect it. XO vm details are: image

bogdantomasciuc avatar May 15 '23 13:05 bogdantomasciuc

We should try to reproduce (spike then).

olivierlambert avatar Sep 14 '23 13:09 olivierlambert

Lib xenapi rewrite planned to improve that. Work started by @julien-f

olivierlambert avatar Sep 18 '23 09:09 olivierlambert

See #6947

julien-f avatar Sep 18 '23 09:09 julien-f

Great news! Looking forward to testing! 🥳

bogdantomasciuc avatar Sep 18 '23 17:09 bogdantomasciuc