cloudstack icon indicating copy to clipboard operation
cloudstack copied to clipboard

Load Balance HealthCheck Fail when Making Changes to HA Proxy Setting

Open btzq opened this issue 1 year ago • 7 comments

ISSUE TYPE
  • Bug Report
COMPONENT NAME
Virtual Router
CLOUDSTACK VERSION
4.19.1.1
CONFIGURATION
OS / ENVIRONMENT
SUMMARY

We have 50 Autoscale Groups within the same Virtual Router.

The Default Setting for VR Load Balancer is where:

  • Timeout Client = 50 Seconds
  • Timeout Server = 50 Seconds

We have a use case where lots of our customers have very long lived TCP Connections. So we went into the VR, and made the following changes:

  • Timeout Client = 50 Seconds -> 30 Minutes
  • Timeout Server = 50 Seconds -> 30 Minutes
  • Added TCP Keep Alive (Option tcpka)

WhatsApp Image 2024-08-23 at 7 16 17 PM

After that, we tried accessing services using Autoscale Group Load Balancer. It working for 3-5 Minutes and then we were suddenly we got cut off. We couldnt PING or Telnet the VMs under Autoscale anymore.

We checked and saw VR started throwing HealthCheck issues where: 'Missing Load Balancer For XXXXXX'

To resolve the issue temporarily, we added a new LB Rule and removed it, to force HA Proxy to get the new info.

WhatsApp Image 2024-08-23 at 8 07 30 PM

As a result, the HA Proxy Config was reloaded and the settings all got overrided.

But strangely enough, once we repeat the steps above, the load balance issue no longer there. It seems to be intermittent.

STEPS TO REPRODUCE
Refer Above
EXPECTED RESULTS
To be able to change the TCP Timeout Setting without any issue. 
ACTUAL RESULTS
Get Healthcheck fails. And suddenly services didnt work. 

btzq avatar Aug 23 '24 12:08 btzq

Each time when update lb, the haproxy.cfg is regenerated by ACS mgmt or agent. in that case, all your haproxy changes are gone.

I do not think your changes on timeout settings caused the issue. but the other settings, I am not sure

weizhouapache avatar Aug 26 '24 07:08 weizhouapache

@weizhouapache I think there are two issues here:

  • Why would the VR suddenly have health check fail related to LB when we performed the above action?
  • How can we change the HA Proxy Default Setting to support the desired settings above? (Timeout Client/Server 50s -> 30Min)

btzq avatar Aug 26 '24 10:08 btzq

@weizhouapache I think there are two issues here:

  • Why would the VR suddenly have health check fail related to LB when we performed the above action?

It might be related to your changes. Have you faced the similar issue before ?

  • How can we change the HA Proxy Default Setting to support the desired settings above? (Timeout Client/Server 50s -> 30Min)

AFAIK, no You can find supported customized lb config by searching haproxy in global settings

weizhouapache avatar Aug 26 '24 10:08 weizhouapache

We went through HA Proxy in Global Settings but couldnt find anything that helps us overcome this issue.

Where is haproxy.cfg stored? Is it in the systemVM template? If so, would customizing the systemVM template and changing our VPC Network Offering work?

For the health check fail, no we havent encountered this issue before. This is new to us.

btzq avatar Aug 26 '24 11:08 btzq

We went through HA Proxy in Global Settings but couldnt find anything that helps us overcome this issue.

Where is haproxy.cfg stored? Is it in the systemVM template? If so, would customizing the systemVM template and changing our VPC Network Offering work?

For the health check fail, no we havent encountered this issue before. This is new to us.

the /etc/haproxy/haproxy.cfg in VR is generated by ACS class https://github.com/apache/cloudstack/blob/main/core/src/main/java/com/cloud/network/HAProxyConfigurator.java

the timeout settings are hard-coded. you can

  • change the values in the code
  • build core sub-project
  • replace cloud-core-XXX.jar on kvm hosts and restart cloudstack-agent

weizhouapache avatar Aug 26 '24 11:08 weizhouapache

I see @weizhouapache , i think we have no choice.

How do we build this component? We've never done this before. Is there any guide we can refer to?

btzq avatar Aug 26 '24 11:08 btzq

I see @weizhouapache , i think we have no choice.

How do we build this component? We've never done this before. Is there any guide we can refer to?

You can refer to https://github.com/shapeblue/hackerbook/blob/main/2-dev.md#building-cloudstack

weizhouapache avatar Aug 26 '24 11:08 weizhouapache

@btzq , this issue closed with #10710 , can you verify that it is indeed gone?

DaanHoogland avatar Sep 22 '25 06:09 DaanHoogland