charts [bitnami/mongodb] suboptimal deployment setting

Name and Version

bitnami/mongodb

What architecture are you using?

None

What steps will reproduce the bug?

Just to install mongodb with helmrelease:

---
apiVersion: v1
kind: Namespace
metadata:
  name: mongo
---
apiVersion: source.toolkit.fluxcd.io/v1beta1
kind: HelmRepository
metadata:
  name: bitnami
  namespace: mongo
spec:
  interval: 5m0s
  timeout: 1m0s
  url: https://charts.bitnami.com/bitnami
---
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: mongo
  namespace: mongo
spec:
  chart:
    spec:
      chart: mongodb
      sourceRef:
        kind: HelmRepository
        name: bitnami
      version: '*'
  interval: 5m0s
  values:
    architecture: standalone
    auth:
      enabled: true
      rootUser: root
      rootPassword: "e8L85239yHVZ2jwFVzaS"

The mongo is installed and running. But then I see the next:

$ kubectl get pods -n mongo
NAME                             READY   STATUS             RESTARTS           AGE
mongo-mongodb-5dc8c88457-4l4kb   1/1     Running            0                  15d
mongo-mongodb-c686c8bf8-dg7qj    0/1     CrashLoopBackOff   2598 (3m17s ago)   9d

The issue is useStatefulSet: false default value. It simply does not work well. No documentation was found with a warning. I am asking to change the default value to useStatefulSet: true which solves the issue with multiple pods of mongodb sharing the same volume.

What do you see instead?

n/a

Mar 05 '24 19:03 gecube

Hi @gecube

Could you please share the logs of the MongoDB pod that's under a CrashLoopBackOff status? When using the "standalone" mode it should be possible to use a deployment.

Mar 06 '24 10:03 juan131

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

Mar 22 '24 01:03 github-actions[bot]

@juan131 Hi! Unfortunately, no. I am using PVC with RWO. So effectively two Mongo Deployments pods could not co-exist simultaneously. No logs. Only in kubectl describe one can see that the PVC is claimed and the second pod could not be started. Another option could be to make the Recreate strategy for Deployment as default.

Mar 22 '24 05:03 gecube

That makes sense @gecube !

Regarding alternatives, there's no need to change the update strategy. You can keep the rollingUpdate strategy but playing with maxSurge & maxUnavailable instead:

$ kubectl explain deployment.spec.strategy.rollingUpdate
GROUP:      apps
KIND:       Deployment
VERSION:    v1

FIELD: rollingUpdate <RollingUpdateDeployment>

DESCRIPTION:
    Rolling update config params. Present only if DeploymentStrategyType =
    RollingUpdate.
    Spec to control the desired behavior of rolling update.

FIELDS:
  maxSurge	<IntOrString>
    The maximum number of pods that can be scheduled above the desired number of
    pods. Value can be an absolute number (ex: 5) or a percentage of desired
    pods (ex: 10%). This can not be 0 if MaxUnavailable is 0. Absolute number is
    calculated from percentage by rounding up. Defaults to 25%. Example: when
    this is set to 30%, the new ReplicaSet can be scaled up immediately when the
    rolling update starts, such that the total number of old and new pods do not
    exceed 130% of desired pods. Once old pods have been killed, new ReplicaSet
    can be scaled up further, ensuring that total number of pods running at any
    time during the update is at most 130% of desired pods.

  maxUnavailable	<IntOrString>
    The maximum number of pods that can be unavailable during the update. Value
    can be an absolute number (ex: 5) or a percentage of desired pods (ex: 10%).
    Absolute number is calculated from percentage by rounding down. This can not
    be 0 if MaxSurge is 0. Defaults to 25%. Example: when this is set to 30%,
    the old ReplicaSet can be scaled down to 70% of desired pods immediately
    when the rolling update starts. Once new pods are ready, old ReplicaSet can
    be scaled down further, followed by scaling up the new ReplicaSet, ensuring
    that the total number of pods available at all times during the update is at
    least 70% of desired pods.

Mar 22 '24 07:03 juan131

@juan131 Hi! Thanks for your suggestions, but I want to emphasise again that like DevOps engineer I am expecting that helm chart will work with default cloud like AWS (with RWO EBS volumes) out-of-box. It does not happen. It means that chart has suboptimal defaults. I am asking to consider changing them. That's all.

Mar 22 '24 07:03 gecube

Hi @gecube

Thanks for your feedback, we appreciate it.

We try to offer our users Helm charts that are flexible enough to adapt to a wide range of scenarios. Please note the default values are mainly intended to try the charts on basic scenarios with simple architectures. It's your responsibility as DevOps engineer to adapt the chart values to your specific environment and requirements.

Mar 25 '24 08:03 juan131

It's your responsibility as DevOps engineer to adapt the chart values to your specific environment and requirements.

completely disagree. When I take some solution from opensource, I am expecting that it will work with some default scenario. Bitnami's mongodb chart offers deployment approach which is proven to be wrong. I explained why. That's all. Also it completely breaks the trust into bitnami solution, so if bitnami will provide some enterprise or paid support, it won't create a proper relationship and trust with the client.

Mar 26 '24 13:03 gecube

Hi @gecube

You're assuming a multi-node cluster with RWO dynamic volume provisioning running on AWS is a default scenario, while so many other users trying the chart may use a very different kind of cluster (e.g. a single-node local cluster using Minikube OR kind). Please note there are infinite ways to run & operate k8s cluster and it's hard to define what's the default. Users are required to read the chart documentation, understand the different alternatives parameters it offers and adapt it to their specific requirements.

That said, there's a change that we might need to introduce in the chart in the default values to make it more consistent with the "standalone" architecture. This change is the one below which can be addressed switching the current default values for podAffinityPreset & podAntiAffinityPreset:

    spec:
      automountServiceAccountToken: false
      serviceAccountName: mongodb
      affinity:
-       podAntiAffinity:
+       podAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - podAffinityTerm:
                labelSelector:
                  matchLabels:
                    app.kubernetes.io/instance: mongodb
                    app.kubernetes.io/name: mongodb
                    app.kubernetes.io/component: mongodb
                topologyKey: kubernetes.io/hostname
              weight: 1

This would make the solution more consistent since K8s will try to schedule new MongoDB pods in the same node while running rolling updates, avoiding issues with RWO volumes.

Mar 27 '24 09:03 juan131

@juan131 It is still bad idea to have a Mongo with multiple pods running with the same volume. Why? Because running every stateful database (MongoDB, PostgreSQL) with transaction logs sharing the same data catalogue leads (in worst case) to data loss. Or in optimistic scenario - second pod will be running, but instance will see that somebody already has a lock on the data files and will just fail. But I believe that this lock is not a good protection as in case of accidental pod stop the lock file will still be present and it means that Mongo should remove it after the recovery process.

Mar 27 '24 10:03 gecube

while so many other users trying the chart may use a very different kind of cluster (e.g. a single-node local cluster using Minikube OR kind).

totally agree - there are many different ways to run k8s. But they all could be classified and put into some groups or buckets.

Mar 27 '24 10:03 gecube

Hi @gecube

It is still bad idea to have a Mongo with multiple pods running with the same volume. Why? Because running every stateful database (MongoDB, PostgreSQL) with transaction logs sharing the same data catalogue leads (in worst case) to data loss

The idea of the standalone architecture is to run a single MongoDB node. This architecture doesn't support horizontal scaling. To run N MongoDB instances, use the Replicaset architecture, see:

https://github.com/bitnami/charts/tree/main/bitnami/mongodb#architecture

Apr 02 '24 06:04 juan131

@juan131 Thanks for your opinion and reply.

This architecture doesn't support horizontal scaling.

If yes, why not to change the default update policy to recreate? It should resolve the issue. I think the change is really trivial and like low-hanging fruit. I could prepare PR if you want :-) As I explained before - unfortunately, MongoDB chart is not ready for use. It is very pity. As Bitnami has a very recognisable name and usually is the best choice for some kind of PoC.

Apr 03 '24 09:04 gecube

Hi @gecube

I must admit I was a little bit stubborn on studying different alternatives but you convinced me. It makes sense to use Recreate with the standalone architecture so please go ahead and create the PR, I'll be glad to review it.

Small request: could we add a new validation at _helpers.tpl#L284-L300 to warn users switching the architecture to "replicaset" to change the default the update strategy type?

{{/*
Validate values of MongoDB&reg; - must provide a valid update strategy type
*/}}
{{- define "mongodb.validateValues.updateStrategy" -}}
{{- if eq .Values.updateStrategy.type "Recreate" }}
{{- if eq .Values.architecture "replicaset" -}}
mongodb: updateStrategy.type
    Only "RollingUpdate" and "OnDelete" update strategy types are supported using
    the "replicaset" architecture since it is based on statefulsets.
{{- else if .Values.useStatefulSet -}}
mongodb: updateStrategy.type
    By specifying "useStatefulSet=true" only "RollingUpdate" and "OnDelete"
    update strategy are supported.
{{- end -}}
{{- end -}}
{{- end -}}

Apr 04 '24 07:04 juan131

@juan131 Hi! Thanks! I will do and return to you.

Apr 09 '24 06:04 gecube

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

Apr 25 '24 01:04 github-actions[bot]

still had no chance to work on PR, will try to reserve some time. I am kindly asking to remove stale label

Apr 25 '24 05:04 gecube

Label removed. Don't worry @gecube, take all the time you need.

Apr 25 '24 06:04 juan131

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

May 11 '24 01:05 github-actions[bot]

Due to the lack of activity in the last 5 days since it was marked as "stale", we proceed to close this Issue. Do not hesitate to reopen it later if necessary.

May 16 '24 01:05 github-actions[bot]