cloudstack icon indicating copy to clipboard operation
cloudstack copied to clipboard

Suggestion: When Global Setting such as (network.loadbalancer.haproxy.max.conn) is changed, mark VR as 'Requires Upgrade' instead of marking it as failed healtcheck.

Open btzq opened this issue 1 year ago • 1 comments

ISSUE TYPE
  • Improvement Request
COMPONENT NAME
Virtual Router, HA Proxy
CLOUDSTACK VERSION
4.19.1
CONFIGURATION
OS / ENVIRONMENT
SUMMARY

One of our customers required larger HA Proxy Max Connections as they have many users connecting at the same time.

So, we change the default value of the below parameter in Global Settings to a new one:

  • network.loadbalancer.haproxy.max.conn = 500,000 (Previous is 4096, which was the default value)

Once implemented, and we restarted the cloudstack server, we got a whole bunch of healthcheck failures.

Screenshot below: Screenshot 2024-10-15 at 10 29 55 PM

Screenshot 2024-10-15 at 10 31 06 PM

In this case, I dont think this should be counted as a healthcheck issue. Because the service seems to be working fine.

I think what would be a better experience for the operator, is to mark the router as 'Requires Upgrade'.

Because the VR does not need to be re-created. It just needed to be forced rebooted. (FYI, normal reboot doesnt seem to cause the VR to load the new maxconn value).

And as an operator, we rely on the 'Alert' section to ensure all customer VR are working normally. This current behavior creates alot of noise.

Even better, is for each customer to be able set their own (network.loadbalancer.haproxy.max.conn) value, and additional settings. Because not all customers requires such large values.

STEPS TO REPRODUCE
Refer above
EXPECTED RESULTS
Mark the router as 'Requires Upgrade', when a Global Setting is changed, such as network.loadbalancer.haproxy.max.conn
ACTUAL RESULTS
Bombarded with Health Check fails for all VRs created, which requires manual force reboot or cleanup VR. (normal reboot doesnt work).

btzq avatar Oct 15 '24 14:10 btzq

seems to be a valid bug

weizhouapache avatar Oct 16 '24 06:10 weizhouapache

@btzq (cc @weizhouapache ) it makes sense to mark the VR for "Requires Upgrade", but as long as that is not done the health check failure is genuine, isn't it?

DaanHoogland avatar Apr 14 '25 10:04 DaanHoogland

@btzq (cc @weizhouapache ) it makes sense to mark the VR for "Requires Upgrade", but as long as that is not done the health check failure is genuine, isn't it?

In fact is "upgrade" really the "required" thing? I would think it requires a restart/cleanup. I am not sure if it makes sense to add that as a flag, but at first sight it seems more appropriate.

DaanHoogland avatar Apr 14 '25 10:04 DaanHoogland

@btzq (cc @weizhouapache ) it makes sense to mark the VR for "Requires Upgrade", but as long as that is not done the health check failure is genuine, isn't it?

In fact is "upgrade" really the "required" thing? I would think it requires a restart/cleanup. I am not sure if it makes sense to add that as a flag, but at first sight it seems more appropriate.

+1 maybe create another flag

alternatively, we could introduce a new level of health check result ?

  • Success (everything is good)
  • Failure (service is down, VM config is missing, etc)
  • Alert (some value mismatch but do not really cause a failure)

weizhouapache avatar Apr 14 '25 10:04 weizhouapache

* Alert (some value mismatch but do not really cause a failure)

with a yellow point (as opposed to green or red)

DaanHoogland avatar Apr 14 '25 11:04 DaanHoogland

* Alert (some value mismatch but do not really cause a failure)

with a yellow point (as opposed to green or red)

yeah, makes sense ?

weizhouapache avatar Apr 14 '25 11:04 weizhouapache

@weizhouapache @DaanHoogland yup i think it makes sense. Less misleading, more clearer, less questions.

btzq avatar Apr 14 '25 13:04 btzq

@btzq , i have the basic mechs in place

  • upgrade needs some attention still
  • I am not entirely sure how to handle which test. Can you have a thought about that with me? please see #10710 for results so far.. (cc @weizhouapache )

DaanHoogland avatar Apr 25 '25 15:04 DaanHoogland

Hey @DaanHoogland ,

The 'Warning' state for when any of the health check needs attention (but is not in any error/failed state) makes sense. The UI in the ticket attached makes sense, but i see the screenshot is only pertaining to the health check page.

I believe there are few more screens to account for:

  • Alerts Page - How will this alert message look like if a health check is in a warning state? What about webhooks?
  • Events Page - I believe this message output will be the same as whats written in the Healthcheck or Alerts page?
  • Virtual Router Page - Will the 'Requires Upgrade' field turn to 'Yes' when any healthcheck is in a warning state?

btzq avatar Apr 25 '25 15:04 btzq

Hey @DaanHoogland ,

The 'Warning' state for when any of the health check needs attention (but is not in any error/failed state) makes sense. The UI in the ticket attached makes sense, but i see the screenshot is only pertaining to the health check page.

I believe there are few more screens to account for:

  • Alerts Page - How will this alert message look like if a health check is in a warning state? What about webhooks?
  • Events Page - I believe this message output will be the same as whats written in the Healthcheck or Alerts page?
  • Virtual Router Page - Will the 'Requires Upgrade' field turn to 'Yes' when any healthcheck is in a warning state?

@btzq , It makes all sense, but I will not take this into scope for now. I am fine if this evolve as we go along, though. For now I will focus on the points I mentioned (in https://github.com/apache/cloudstack/issues/9800#issuecomment-2830756094).

DaanHoogland avatar Apr 26 '25 15:04 DaanHoogland

@btzq , would you see chance to test #10710?

DaanHoogland avatar Sep 16 '25 13:09 DaanHoogland

@btzq , with #10710 merged this might be solved, but we did not play your scenario. Please keep us updated here?!?

DaanHoogland avatar Sep 22 '25 06:09 DaanHoogland