stackstorm-k8s
stackstorm-k8s copied to clipboard
wait-for-db check won't ever pass with internal mongodb
There is a service name mismatch between the 'wait-for-db' check helper and the service name for mongodb being created by the chart, specifically right here: https://github.com/StackStorm/stackstorm-k8s/blob/b6419e68a8f7235e03b1b878681370a29ba65839/templates/_helpers.tpl#L112
You are looking for {{ $.Release.Name }}-mongodb-headless
when the service created matches {{ $.Release.Name }}-mongodb
. I would have created a pull request to fix, but was denied permission to push a branch.
Here is proof of the fix from my test environment:
root@stackstorm-st2client-5fff65dbfc-c7z8c:/opt/stackstorm# nc -z -w 2 stackstorm-mongodb-headless 27017 && echo 'broken'
nc: getaddrinfo for host "stackstorm-mongodb-headless" port 27017: Name or service not known
root@stackstorm-st2client-5fff65dbfc-c7z8c:/opt/stackstorm# nc -z -w 2 stackstorm-mongodb 27017 && echo 'working'
working
root@stackstorm-st2client-5fff65dbfc-c7z8c:/opt/stackstorm#
Curious to know what your cluster shows of the command: kubectl get services
but was denied permission to push a branch
Can you please clarify?
@cflanny See https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request-from-a-fork how to create a PR from a fork.
Here is the terraform call creating the helm release so you have access to the (templated) values being used, in case you want that info as well:
module "stackstorm_cluster" {
source = "../helm"
eks = var.eks
environment = var.environment
charts = [
{
name = "stackstorm"
repository = "https://helm.stackstorm.com/"
chart = "stackstorm-ha"
version = "0.100.0"
namespace = "stackstorm"
wait = true
route53 = {
name = "management"
zone_id = var.route53.route53_zone.zone_id
type = "CNAME"
target = var.eks.private_alb.lb_dns_name
}
values_yaml = <<EOF
st2:
username: rvadmin
password: "${random_password.rvadmin.result}"
ingress:
enabled: true
annotations:
kubernetes.io/ingress.class: nginx
ingress.kubernetes.io/secure-backends: "false"
hosts:
- host: management.${var.route53.route53_zone.name}
paths:
- path: /
serviceName: stackstorm-st2web
servicePort: 80
tls:
- hosts:
- management.${var.route53.route53_zone.name}
volumes:
enabled: true
packs:
nfs:
server: "${module.st2_efs.dns_name}"
path: "/st2/packs"
virtualenvs:
nfs:
server: "${module.st2_efs.dns_name}"
path: "/st2/venvs"
configs:
nfs:
server: "${module.st2_efs.dns_name}"
path: "/st2/configs"
st2web:
replicas: 2
service:
type: "ClusterIP"
hostname: "management.${var.route53.route53_zone.name}"
mongodb:
architecture: standalone
auth:
username: st2-mongo-user
password: "${random_password.mongodb_user.result}"
rootPassword: "${random_password.mongodb_root.result}"
replicaSetKey: "${random_password.mongodb_replica_key.result}"
rabbitmq:
auth:
username: st2-rabbit-user
password: "${random_password.rabbit_user.result}"
erlangCookie: "${random_password.rabbit_erlang_cookie.result}"
EOF
}
]
}
and here is the requested output from kubectl
on a fresh deployment:
❯ kubectl get services -n stackstorm
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
stackstorm-mongodb ClusterIP 10.100.199.110 <none> 27017/TCP 15s
stackstorm-rabbitmq ClusterIP 10.100.194.38 <none> 5672/TCP,4369/TCP,25672/TCP,15672/TCP 15s
stackstorm-rabbitmq-headless ClusterIP None <none> 4369/TCP,5672/TCP,25672/TCP,15672/TCP 15s
stackstorm-redis ClusterIP 10.100.86.77 <none> 6379/TCP,26379/TCP 15s
stackstorm-redis-headless ClusterIP None <none> 6379/TCP,26379/TCP 15s
stackstorm-st2api ClusterIP 10.100.122.164 <none> 9101/TCP 15s
stackstorm-st2auth ClusterIP 10.100.154.119 <none> 9100/TCP 15s
stackstorm-st2stream ClusterIP 10.100.202.30 <none> 9102/TCP 15s
stackstorm-st2web ClusterIP 10.100.235.221 <none> 80/TCP 15s
@cflanny that's strange as I do not see mongodb-headless service
which was the case earlier. I also do not see anything overridden in the section in your configuration that could have made this. The headless service is what I found bitnami
helm charts' convention. I am on 0.80.0
version of our helm chart in our production and below is the output I am familiar with:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 172.20.0.1 <none> 443/TCP 255d
st2-staging-mongodb-headless ClusterIP None <none> 27017/TCP 255d
st2-staging-rabbitmq ClusterIP 172.20.227.144 <none> 5672/TCP,4369/TCP,25672/TCP,15672/TCP 255d
st2-staging-rabbitmq-headless ClusterIP None <none> 4369/TCP,5672/TCP,25672/TCP,15672/TCP 255d
st2-staging-redis ClusterIP 172.20.91.210 <none> 6379/TCP,26379/TCP 255d
st2-staging-redis-headless ClusterIP None <none> 6379/TCP,26379/TCP 255d
I wonder if something has changed recently, but I have not played around with it. @armab @cognifloyd anything you can think of?
mongodb:
architecture: standalone
That's probably the reason here as the default for the Helm chart is a replicaset
architecture and so probably other templating and naming is used in the upstream MongoDB chart.
For the upstream chart templating values we have the following: https://github.com/bitnami/charts/blob/504d12bf3fe0e1348f5b9d6c6a9d15cd0a60517e/bitnami/mongodb/templates/_helpers.tpl#L19-L27
and so here:
until nc -z -w 2 {{ $.Release.Name }}-mongodb-headless {{ $mongodb_port }} && echo mongodb ok;
instead of hardcoding headless
we should use mongodb.service.nameOverride
var from the upstream chart instead to avoid the corner cases with the mongodb custom service naming.
Yeah nah, that won't work.
MongoDB is using different naming and relying on different variables for the different archirectures (replicaset vs standalone):
-
name: {{ include "mongodb.fullname" . }}
https://github.com/bitnami/charts/blob/504d12bf3fe0e1348f5b9d6c6a9d15cd0a60517e/bitnami/mongodb/templates/standalone/svc.yaml#L5 -
name: {{ include "mongodb.service.nameOverride" . }}
https://github.com/bitnami/charts/blob/504d12bf3fe0e1348f5b9d6c6a9d15cd0a60517e/bitnami/mongodb/templates/replicaset/headless-svc.yaml#L5
Thanks @armab for that. You are correct and I definitely missed that. In a way this is as designed, correct? Should we support non-HA mode like this for our K8s
deployment model?
For what it's worth, putting Mongo back into high-availability seems to have taken care of that problem, and it would appear the reason for me setting it to standalone has also been taken care of (bug in downstream Mongo chart causing fresh container in fresh namespace to CrashLoop.)
This can probably be fixed with an if on the service name depending on the value of Values.mongodb.architecture.
Yeah, a Pull Request is welcome to fix this issue :+1: