mysql-operator icon indicating copy to clipboard operation
mysql-operator copied to clipboard

MySQL Cluster failover failture

Open liyongxian opened this issue 5 years ago • 2 comments

kubernetes info:

Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.5", GitCommit:"2166946f41b36dea2c4626f90a77706f426cdea2", GitTreeState:"clean", BuildDate:"2019-03-25T15:26:52Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.5", GitCommit:"2166946f41b36dea2c4626f90a77706f426cdea2", GitTreeState:"clean", BuildDate:"2019-03-25T15:19:22Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}

Chart version v0.3.0-rc.3 Problem description: Mysql Cluster has 3 node,mysql-cluster-0 is Master Node,others are Slave Node. When I delete pod mysql-cluster-0(Cluster Master Node),Master Node failover to mysql-cluster-2. After the node mysql-cluster-0 running, the node should be one slave.Then the slave status of node mysql-cluster-0 happens to wrong. The Info: Node:mysql-cluster-0 MySQL CMD: show slave status\G;

Slave_IO_State: 
                  Master_Host: mysql-cluster-db-mysql-2.mysql.mysql-operator
                  Master_User: sys_replication
                  Master_Port: 3306
                Connect_Retry: 10
              Master_Log_File: 
          Read_Master_Log_Pos: 4
               Relay_Log_File: mysql-cluster-db-mysql-0-relay-bin.000001
                Relay_Log_Pos: 4
        Relay_Master_Log_File: 
             Slave_IO_Running: No
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 0
              Relay_Log_Space: 154
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 1236
                Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.'
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 102
                  Master_UUID: 20554341-9d3f-11e9-ae4b-ae3830e16ff6
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 190703 03:41:00
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 50d2e21a-9d44-11e9-a604-46a8049f1cfb:1-9
                Auto_Position: 1
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 

Node:mysql-cluster-2 MySQL CMD: show slave status\G;

Slave_IO_State: Connecting to master
                  Master_Host: //mysql-cluster-db-mysql-0.mysql.mysql-operator
                  Master_User: sys_replication
                  Master_Port: 3306
                Connect_Retry: 10
              Master_Log_File: 
          Read_Master_Log_Pos: 4
               Relay_Log_File: mysql-cluster-db-mysql-2-relay-bin.000001
                Relay_Log_Pos: 4
        Relay_Master_Log_File: 
             Slave_IO_Running: Connecting
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 0
              Relay_Log_Space: 154
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 2005
                Last_IO_Error: error connecting to master 'sys_replication@//mysql-cluster-db-mysql-0.mysql.mysql-operator:3306' - retry-time: 10  retries: 111
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 0
                  Master_UUID: 
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 190703 03:58:40
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 20554341-9d3f-11e9-ae4b-ae3830e16ff6:1-1106,
5dacb80e-9d3d-11e9-a8c1-7aaa9a0ae85c:1-2901
                Auto_Position: 1
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 

Thanks a lot.

liyongxian avatar Jul 03 '19 04:07 liyongxian

Hi @liyongxian , node-2 it's ok, it in a detached mode, set by orchestrator, the tool that we use for fast failovers.

Indeed the node-0 should connect successfully to node-2. Can you give me a little more context, did you set some custom MySQL config?

Also, the resource description and controller logs will be very useful to debug this.

Thank you!

AMecea avatar Jul 08 '19 13:07 AMecea

@AMecea @liyongxian I have the same problem. Has it been resolved?

lizhongxuan avatar Jun 04 '21 11:06 lizhongxuan