percona-server-mysql-operator icon indicating copy to clipboard operation
percona-server-mysql-operator copied to clipboard

endless "failed to update status"

Open abh opened this issue 3 months ago • 7 comments

Report

the ps-controller logs variations of this A LOT -- it seems like it's not sufficiently loading the current status before updating it.

More about the problem

2025-09-26T00:43:05.954Z        ERROR   failed to update status {"controller": "ps-controller", "controllerGroup": "ps.percona.com", "controllerKind": "PerconaServerMySQL", "PerconaServerMySQL": {"name":"ntpdb","namespace":"ntpdb"}, "namespace": "ntpdb", "name": "ntpdb", "reconcileID": "7db885ce-83b1-407d-b939-19162eece731", "error": "Operation cannot be fulfilled on perconaservermysqls.ps.percona.com \"ntpdb\": the object has been modified; please apply your changes to the latest version and try again"}

Steps to reproduce

  • run a cluster?

Versions

  1. Kubernetes - v1.32.6
  2. Operator - v0.12.0
  3. Database - v8.0.43 (?)

Anything else?

No response

abh avatar Sep 26 '25 01:09 abh

Here's the operator log. I had reset a cluster entirely because group replication had failed, and it looks like the operator never got to update the status with the cluster being initialized.

percona-operator-log.txt

This is the status from the CR -- last transition time here was ~9 hours ago (the cluster is ~17 hours old, so not sure what happened 9 hours ago).

Status:
  Conditions:
    Last Transition Time:  2025-09-25T15:58:03Z
    Message:
    Reason:                Initializing
    Status:                False
    Type:                  Initializing
    Last Transition Time:  2025-09-25T15:58:03Z
    Message:
    Reason:                Ready
    Status:                True
    Type:                  Ready
    Last Transition Time:  2025-09-25T16:08:18Z
    Message:               replication: reconcile group replication: reconcile bootstrap status: wait for cached cr to updated with condition: context deadline exceeded
    Reason:                ErrorReconcile
    Status:                True
    Type:                  Error
  Haproxy:
    Ready:  2
    Size:   2
    State:  ready
  Host:     ntpdb-haproxy.ntpdb
  Mysql:
    Ready:  3
    Size:   3
    State:  ready

abh avatar Sep 26 '25 01:09 abh

Another quirk -- I don't see in the operator code the mysql_innodb_cluster_xxxx users, but I get a lot of things like this in the logs:

2025-09-26T01:20:01.900862Z 36 [Note] [MY-010926] [Server] Access denied for user 'mysql_innodb_cluster_42132052'@'10.42.4.8' (using password: YES)

Again this is a pretty "plain" cluster that was setup fresh with the 0.12.0 operator yesterday.

CR attached here:

ntpdb.yaml

abh avatar Sep 26 '25 01:09 abh

@abh Thank you for the testing. We will check it today.

hors avatar Sep 26 '25 07:09 hors

Here's the operator log. I had reset a cluster entirely because group replication had failed, and it looks like the operator never got to update the status with the cluster being initialized.

How did you do it? How did you reset it? I need the STR (steps to reproduce).

hors avatar Sep 26 '25 07:09 hors

I deleted the CR's, the PVCs, the STS, and shutdown the operator and then build it back and restored from a mysqldump for each of the databases I needed.

abh avatar Sep 27 '25 20:09 abh

2025-09-26T01:20:01.900862Z 36 [Note] [MY-010926] [Server] Access denied for user 'mysql_innodb_cluster_42132052'@'10.42.4.8' (using password: YES)

Do you have user creation in your dump file? Please use these two options when you create a dump: mysqldump --skip-add-drop-user --skip-user-creation -u root -p your_database > new_dump.sql

or '--skip-system-users'.

hors avatar Sep 28 '25 08:09 hors

@abh I am fairly sure that your status error will go away in v1.0.0 thanks to this commit: https://github.com/percona/percona-server-mysql-operator/commit/16ee670039835c54e12eedff81189ba373f7e3ef

egegunes avatar Oct 31 '25 15:10 egegunes

@abh I’m happy to inform you that the MySQL Kubernetes Operator v1.0.0 was released today. This is the GA release, and you can start using it for production workloads. I also want to say a big thank you for using our operator and helping us improve it by providing feedback.

hors avatar Nov 17 '25 19:11 hors