mariadb-operator icon indicating copy to clipboard operation
mariadb-operator copied to clipboard

[Featue] Support for `startupProbe` provided by the user

Open taxilian opened this issue 2 years ago • 11 comments

Describe the bug

I'm a new user, trying to spin up my first cluster using replication but not using Galera. I'll include my config at the end.

The pods all start up correctly and give this output:

[Note] [Entrypoint]: Entrypoint script for MariaDB Server 1:11.0.3+maria~ubu2204 started.
[Note] [Entrypoint]: Switching to dedicated user 'mysql'
[Note] [Entrypoint]: Entrypoint script for MariaDB Server 1:11.0.3+maria~ubu2204 started.
[Note] [Entrypoint]: Initializing database files

However, before it moves on to that step the livenessProbe check fails and it restarts the pod; I've run it a couple of times and sometimes it gets far enough for the pod to start up and get to a ready state, sometimes not -- but so far it's never gotten far enough to create the user. When it restarts it doesn't recover, it just fails to work.

I'm fairly sure that this issue is partly because the storage is a little on the slow side, but since there doesn't seem to be a way to set a startupProbe that is longer than the livenessProbe I don't have any way to work around it.

Expected behaviour

If it either defined a startupProbe which could handle longer startup times or allowed the user to define one then it would presumably finish.

Steps to reproduce the bug

Additional context

My MariaDB resource:


apiVersion: mariadb.mmontes.io/v1alpha1
kind: MariaDB
metadata:
  name: mariadb-repl
  namespace: signalstuff
spec:
  rootPasswordSecretKeyRef:
    name: mariadb
    key: root-password

  database: mariadb
  username: mariadb
  passwordSecretKeyRef:
    name: mariadb
    key: password

  image: mariadb:11.0.3

  port: 3306

  replicas: 3

  replication:
    enabled: true
    primary:
      podIndex: 0
      automaticFailover: true
    replica:
      waitPoint: AfterSync
      gtid: CurrentPos
      replPasswordSecretKeyRef:
        name: mariadb
        key: password
      connectionTimeout: 10s
      connectionRetries: 10
      syncTimeout: 10s
    syncBinlog: true

  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        - topologyKey: "kubernetes.io/hostname"
  
  podDisruptionBudget:
    maxUnavailable: 1

  updateStrategy:
    type: RollingUpdate

  primaryService:
    type: LoadBalancer

  myCnf: |
    [mariadb]
    bind-address=*
    default_storage_engine=InnoDB
    binlog_format=row
    innodb_autoinc_lock_mode=2
    max_allowed_packet=256M

  volumeClaimTemplate:
    resources:
      requests:
        storage: 20Gi
    accessModes:
      - ReadWriteOnce
    storageClassName: rook-ssd-encrypted-fordbs

  resources:
    requests:
      cpu: 500m
      memory: 8Gi
    limits:
      cpu: 1000m
      memory: 12Gi

Environment details:

  • Kubernetes version: v1.27.6
  • mariadb-operator version: v0.0.22
  • Install method: helm
  • Install flavour: minor customization

taxilian avatar Nov 07 '23 23:11 taxilian

Hey there @taxilian !

I'm fairly sure that this issue is partly because the storage is a little on the slow side, but since there doesn't seem to be a way to set a startupProbe that is longer than the livenessProbe I don't have any way to work around it.

That's right, there is no way to provide a custom startupProbe, but you are able to customize the livenessProbe and increase the initialDalySeconds, see this example: https://github.com/mariadb-operator/mariadb-operator/blob/7bf25741e00ee33e2319a78ef3101152e00906a1/examples/manifests/mariadb_v1alpha1_mariadb.yaml#L71

Let me know if this helps!

mmontes11 avatar Nov 25 '23 21:11 mmontes11

it's better than nothing, but a startupProbe would be much more ideal long term. The trouble with initialDelaySeconds is that I'd only want it to be high when first provisioning a node; so if I set the initialDelaySeconds high enough that it worked with slower settings then it would still have that initial delay later before it considered the node ready to go, even if after the initial bootstrap it only needs a few seconds.

I actually decided to use local storage instead of putting it on my k8s cluster, but just in general this is something I could see myself hitting in the future. That aside -- thank you for your work on this, I've been trying to get a reliable highly-available mariadb on kubernetes literally for years and all of the others I tried caused me issues

taxilian avatar Nov 27 '23 19:11 taxilian

it's better than nothing, but a startupProbe would be much more ideal long term.

Thanks, we don't currently support adding custom startupProbe to the MariaDB object, but it is something we could support, it shouldn't be complicated. PRs welcome!

That aside -- thank you for your work on this, I've been trying to get a reliable highly-available mariadb on kubernetes literally for years and all of the others I tried caused me issues

Thanks a lot for your kind words, glad to hear that we are helping the Mariadb and Kubernetes community!

mmontes11 avatar Dec 03 '23 17:12 mmontes11

This issue is stale because it has been open 30 days with no activity.

github-actions[bot] avatar Jan 03 '24 02:01 github-actions[bot]

This issue was closed because it has been stalled for 5 days with no activity.

github-actions[bot] avatar Jan 08 '24 02:01 github-actions[bot]

This issue is stale because it has been open 30 days with no activity.

github-actions[bot] avatar Apr 13 '24 01:04 github-actions[bot]

This issue is stale because it has been open 30 days with no activity.

github-actions[bot] avatar May 16 '24 02:05 github-actions[bot]

This issue was closed because it has been stalled for 5 days with no activity.

github-actions[bot] avatar May 22 '24 02:05 github-actions[bot]

This issue is stale because it has been open 30 days with no activity.

github-actions[bot] avatar Jun 22 '24 02:06 github-actions[bot]

I could take a stab at implementing this, could you give me an idea of where to start looking? I'm not familiar with the project layout and I'm an amateur at golang

taxilian avatar Jun 22 '24 21:06 taxilian

This issue is stale because it has been open 60 days with no activity.

github-actions[bot] avatar Aug 22 '24 02:08 github-actions[bot]