stackstorm-k8s icon indicating copy to clipboard operation
stackstorm-k8s copied to clipboard

wait-for-db check won't ever pass with internal mongodb

Open cflanny opened this issue 2 years ago • 10 comments

There is a service name mismatch between the 'wait-for-db' check helper and the service name for mongodb being created by the chart, specifically right here: https://github.com/StackStorm/stackstorm-k8s/blob/b6419e68a8f7235e03b1b878681370a29ba65839/templates/_helpers.tpl#L112

You are looking for {{ $.Release.Name }}-mongodb-headless when the service created matches {{ $.Release.Name }}-mongodb. I would have created a pull request to fix, but was denied permission to push a branch.

Here is proof of the fix from my test environment:

root@stackstorm-st2client-5fff65dbfc-c7z8c:/opt/stackstorm# nc -z -w 2 stackstorm-mongodb-headless 27017 && echo 'broken'
nc: getaddrinfo for host "stackstorm-mongodb-headless" port 27017: Name or service not known
root@stackstorm-st2client-5fff65dbfc-c7z8c:/opt/stackstorm# nc -z -w 2 stackstorm-mongodb 27017 && echo 'working'
working
root@stackstorm-st2client-5fff65dbfc-c7z8c:/opt/stackstorm#

cflanny avatar May 27 '22 21:05 cflanny

Curious to know what your cluster shows of the command: kubectl get services

arms11 avatar May 27 '22 21:05 arms11

but was denied permission to push a branch

Can you please clarify?

arms11 avatar May 27 '22 21:05 arms11

@cflanny See https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request-from-a-fork how to create a PR from a fork.

arm4b avatar May 27 '22 22:05 arm4b

Here is the terraform call creating the helm release so you have access to the (templated) values being used, in case you want that info as well:

module "stackstorm_cluster" {
  source      = "../helm"
  eks         = var.eks
  environment = var.environment
  charts = [
    {
      name       = "stackstorm"
      repository = "https://helm.stackstorm.com/"
      chart      = "stackstorm-ha"
      version    = "0.100.0"
      namespace  = "stackstorm"
      wait       = true

      route53 = {
        name    = "management"
        zone_id = var.route53.route53_zone.zone_id
        type    = "CNAME"
        target  = var.eks.private_alb.lb_dns_name
      }

      values_yaml = <<EOF
st2:
  username: rvadmin
  password: "${random_password.rvadmin.result}"

ingress:
  enabled: true
  annotations:
    kubernetes.io/ingress.class: nginx
    ingress.kubernetes.io/secure-backends: "false"
  hosts:
    - host: management.${var.route53.route53_zone.name}
      paths:
       - path: /
         serviceName: stackstorm-st2web
         servicePort: 80
  tls:
    - hosts:
      - management.${var.route53.route53_zone.name}

volumes:
  enabled: true
  packs:
    nfs:
      server: "${module.st2_efs.dns_name}"
      path: "/st2/packs"
  virtualenvs:
    nfs:
      server: "${module.st2_efs.dns_name}"
      path: "/st2/venvs"
  configs:
    nfs:
      server: "${module.st2_efs.dns_name}"
      path: "/st2/configs"

st2web:
  replicas: 2
  service:
    type: "ClusterIP"
    hostname: "management.${var.route53.route53_zone.name}"

mongodb:
  architecture: standalone
  auth:
    username: st2-mongo-user
    password: "${random_password.mongodb_user.result}"
    rootPassword: "${random_password.mongodb_root.result}"
    replicaSetKey: "${random_password.mongodb_replica_key.result}"

rabbitmq:
  auth:
    username: st2-rabbit-user
    password: "${random_password.rabbit_user.result}"
    erlangCookie: "${random_password.rabbit_erlang_cookie.result}"
EOF
    }
  ]
}

and here is the requested output from kubectl on a fresh deployment:

❯ kubectl get services -n stackstorm
NAME                           TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                                 AGE
stackstorm-mongodb             ClusterIP   10.100.199.110   <none>        27017/TCP                               15s
stackstorm-rabbitmq            ClusterIP   10.100.194.38    <none>        5672/TCP,4369/TCP,25672/TCP,15672/TCP   15s
stackstorm-rabbitmq-headless   ClusterIP   None             <none>        4369/TCP,5672/TCP,25672/TCP,15672/TCP   15s
stackstorm-redis               ClusterIP   10.100.86.77     <none>        6379/TCP,26379/TCP                      15s
stackstorm-redis-headless      ClusterIP   None             <none>        6379/TCP,26379/TCP                      15s
stackstorm-st2api              ClusterIP   10.100.122.164   <none>        9101/TCP                                15s
stackstorm-st2auth             ClusterIP   10.100.154.119   <none>        9100/TCP                                15s
stackstorm-st2stream           ClusterIP   10.100.202.30    <none>        9102/TCP                                15s
stackstorm-st2web              ClusterIP   10.100.235.221   <none>        80/TCP                                  15s

cflanny avatar May 27 '22 22:05 cflanny

@cflanny that's strange as I do not see mongodb-headless service which was the case earlier. I also do not see anything overridden in the section in your configuration that could have made this. The headless service is what I found bitnami helm charts' convention. I am on 0.80.0 version of our helm chart in our production and below is the output I am familiar with:

NAME                            TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                                 AGE
kubernetes                      ClusterIP   172.20.0.1       <none>        443/TCP                                 255d
st2-staging-mongodb-headless    ClusterIP   None             <none>        27017/TCP                               255d
st2-staging-rabbitmq            ClusterIP   172.20.227.144   <none>        5672/TCP,4369/TCP,25672/TCP,15672/TCP   255d
st2-staging-rabbitmq-headless   ClusterIP   None             <none>        4369/TCP,5672/TCP,25672/TCP,15672/TCP   255d
st2-staging-redis               ClusterIP   172.20.91.210    <none>        6379/TCP,26379/TCP                      255d
st2-staging-redis-headless      ClusterIP   None             <none>        6379/TCP,26379/TCP                      255d

I wonder if something has changed recently, but I have not played around with it. @armab @cognifloyd anything you can think of?

arms11 avatar May 27 '22 23:05 arms11

mongodb:
  architecture: standalone

That's probably the reason here as the default for the Helm chart is a replicaset architecture and so probably other templating and naming is used in the upstream MongoDB chart.

For the upstream chart templating values we have the following: https://github.com/bitnami/charts/blob/504d12bf3fe0e1348f5b9d6c6a9d15cd0a60517e/bitnami/mongodb/templates/_helpers.tpl#L19-L27

and so here:

 until nc -z -w 2 {{ $.Release.Name }}-mongodb-headless {{ $mongodb_port }} && echo mongodb ok; 

instead of hardcoding headless we should use mongodb.service.nameOverride var from the upstream chart instead to avoid the corner cases with the mongodb custom service naming.

arm4b avatar May 28 '22 10:05 arm4b

Yeah nah, that won't work.

MongoDB is using different naming and relying on different variables for the different archirectures (replicaset vs standalone):

  • name: {{ include "mongodb.fullname" . }} https://github.com/bitnami/charts/blob/504d12bf3fe0e1348f5b9d6c6a9d15cd0a60517e/bitnami/mongodb/templates/standalone/svc.yaml#L5
  • name: {{ include "mongodb.service.nameOverride" . }} https://github.com/bitnami/charts/blob/504d12bf3fe0e1348f5b9d6c6a9d15cd0a60517e/bitnami/mongodb/templates/replicaset/headless-svc.yaml#L5

arm4b avatar May 28 '22 10:05 arm4b

Thanks @armab for that. You are correct and I definitely missed that. In a way this is as designed, correct? Should we support non-HA mode like this for our K8s deployment model?

arms11 avatar May 28 '22 11:05 arms11

For what it's worth, putting Mongo back into high-availability seems to have taken care of that problem, and it would appear the reason for me setting it to standalone has also been taken care of (bug in downstream Mongo chart causing fresh container in fresh namespace to CrashLoop.)

This can probably be fixed with an if on the service name depending on the value of Values.mongodb.architecture.

cflanny avatar May 31 '22 15:05 cflanny

Yeah, a Pull Request is welcome to fix this issue :+1:

arm4b avatar May 31 '22 15:05 arm4b