terraform-aws-eks-blueprints icon indicating copy to clipboard operation
terraform-aws-eks-blueprints copied to clipboard

[Bug]: Issue initialising ArgoCD haproxy: layer7 timeout

Open tomarrell opened this issue 3 years ago • 3 comments

Welcome to Amazon EKS Blueprints!

  • [X] Yes, I've searched similar issues on GitHub and didn't find any.

Amazon EKS Blueprints Release version

4.0.2

What is your environment, configuration and the example used?

Terraform Version: 1.1.7

main.tf

...

module "eks" {
  source = "github.com/aws-ia/terraform-aws-eks-blueprints?ref=v4.0.2"
  
  cluster_name       = local.name
  cluster_version    = "1.21"
  vpc_id             = module.vpc.vpc_id
  private_subnet_ids = module.vpc.private_subnets

  # IPV6
  cluster_ip_family = "ipv6"

  # EKS MANAGED NODE GROUPS
  managed_node_groups = {
    mg_5 = {
      node_group_name = "managed-ondemand"
      instance_types  = ["m5.large"]
      min_size        = "2"
      subnet_ids      = module.vpc.private_subnets
    }
  }
}

module "eks-blueprints-kubernetes-addons" {
  source         = "github.com/aws-ia/terraform-aws-eks-blueprints?ref=v4.0.2/modules/kubernetes-addons"
  eks_cluster_id = module.eks.eks_cluster_id
  enable_ipv6    = true # Enable Ipv6 network. Attaches new VPC CNI policy to the IRSA role

  # EKS Managed Add-ons
  enable_amazon_eks_vpc_cni    = true
  enable_amazon_eks_coredns    = true
  enable_amazon_eks_kube_proxy = true

  # K8s Add-ons
  enable_argocd                       = true
  enable_aws_load_balancer_controller = true
  enable_prometheus                   = true

  depends_on = [module.eks.managed_node_groups]
}

As you can see, the cluster is setup to support IPv6 networking internally.

What did you do and What did you see instead?

Apply the infra as expected. All resources are created successfully, however, all the haproxy pods which go into a persistent crashloop backoff and will not initialise.

Below are logs from one of the backing off pods.

$ kubectl -n argocd logs argo-cd-redis-ha-haproxy-75fb577466-24qzl
[NOTICE] 115/094553 (1) : New worker #1 (7) forked
[WARNING] 115/094554 (7) : Server check_if_redis_is_master_1/R0 is DOWN, reason: Layer7 timeout, info: " at step 5 of tcp-check (expect string 'fdf8:cee4:5e1::f893')", check duration: 1001ms. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[WARNING] 115/094554 (7) : Server check_if_redis_is_master_1/R1 is DOWN, reason: Layer7 timeout, info: " at step 5 of tcp-check (expect string 'fdf8:cee4:5e1::f893')", check duration: 1001ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[WARNING] 115/094554 (7) : Server check_if_redis_is_master_1/R2 is DOWN, reason: Layer7 timeout, info: " at step 5 of tcp-check (expect string 'fdf8:cee4:5e1::f893')", check duration: 1000ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[ALERT] 115/094554 (7) : backend 'check_if_redis_is_master_1' has no server available!
[WARNING] 115/094554 (7) : Server check_if_redis_is_master_2/R0 is DOWN, reason: Layer7 timeout, info: " at step 5 of tcp-check (expect string 'fdf8:cee4:5e1::92d')", check duration: 1000ms. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[WARNING] 115/094554 (7) : Server check_if_redis_is_master_2/R1 is DOWN, reason: Layer7 timeout, info: " at step 5 of tcp-check (expect string 'fdf8:cee4:5e1::92d')", check duration: 1000ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[WARNING] 115/094554 (7) : Server check_if_redis_is_master_2/R2 is DOWN, reason: Layer7 timeout, info: " at step 5 of tcp-check (expect string 'fdf8:cee4:5e1::92d')", check duration: 1001ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[ALERT] 115/094554 (7) : backend 'check_if_redis_is_master_2' has no server available!
[WARNING] 115/094554 (7) : Server bk_redis_master/R1 is DOWN, reason: Layer7 timeout, info: " at step 5 of tcp-check (expect string 'role:master')", check duration: 1000ms. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[WARNING] 115/094554 (7) : Server bk_redis_master/R2 is DOWN, reason: Layer7 timeout, info: " at step 5 of tcp-check (expect string 'role:master')", check duration: 1000ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[WARNING] 115/094604 (1) : Exiting Master process...
[WARNING] 115/094604 (7) : Stopping proxy health_check_http_url in 0 ms.
[WARNING] 115/094604 (7) : Stopping backend check_if_redis_is_master_0 in 0 ms.
[WARNING] 115/094604 (7) : Stopping backend check_if_redis_is_master_1 in 0 ms.
[WARNING] 115/094604 (7) : Stopping backend check_if_redis_is_master_2 in 0 ms.
[WARNING] 115/094604 (7) : Stopping frontend ft_redis_master in 0 ms.
[WARNING] 115/094604 (7) : Stopping backend bk_redis_master in 0 ms.
[WARNING] 115/094604 (7) : Stopping frontend metrics in 0 ms.
[WARNING] 115/094604 (7) : Stopping frontend GLOBAL in 0 ms.
[WARNING] 115/094604 (7) : Proxy health_check_http_url stopped (FE: 0 conns, BE: 0 conns).
[WARNING] 115/094604 (7) : Proxy check_if_redis_is_master_0 stopped (FE: 0 conns, BE: 0 conns).
[WARNING] 115/094604 (7) : Proxy check_if_redis_is_master_1 stopped (FE: 0 conns, BE: 0 conns).
[WARNING] 115/094604 (7) : Proxy check_if_redis_is_master_2 stopped (FE: 0 conns, BE: 0 conns).
[WARNING] 115/094604 (7) : Proxy ft_redis_master stopped (FE: 0 conns, BE: 0 conns).
[WARNING] 115/094604 (7) : Proxy bk_redis_master stopped (FE: 0 conns, BE: 0 conns).
[WARNING] 115/094604 (7) : Proxy metrics stopped (FE: 0 conns, BE: 0 conns).
[WARNING] 115/094604 (7) : Proxy GLOBAL stopped (FE: 0 conns, BE: 0 conns).
[ALERT] 115/094604 (1) : Current worker #1 (7) exited with code 0 (Exit)
[WARNING] 115/094604 (1) : All workers exited. Exiting... (0)

Additional Information

No response

tomarrell avatar Apr 26 '22 09:04 tomarrell

As a temporary workaround, I have disabled Redis HA.

argocd-values.yaml

redis-ha:
  enabled: false

tomarrell avatar Apr 26 '22 13:04 tomarrell

Confirmed reproducible, may be related to https://github.com/argoproj/argo-helm/issues/1203 .

Zvikan avatar May 03 '22 22:05 Zvikan

This issue has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this issue will be closed in 10 days

github-actions[bot] avatar Jul 19 '22 00:07 github-actions[bot]

This issue has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this issue will be closed in 10 days

github-actions[bot] avatar Aug 19 '22 00:08 github-actions[bot]

Issue closed due to inactivity.

github-actions[bot] avatar Aug 30 '22 00:08 github-actions[bot]