resource-agents icon indicating copy to clipboard operation
resource-agents copied to clipboard

Galera - in perpetual state of Slave

Open lejeczek opened this issue 3 years ago • 5 comments

Hi guys. I'm on CentOS Stream 8 with pcs-0.10.12-1.el8.x86_64 trying to set up 'galera' resource. I have: mariadb-server-galera-10.3.28-1.module_el8.3.0+757+d382997d.x86_64 perfectly working prior but also outside - meaning then it works after pcs resource was used and stopped - but PCS 'galera' resource fails to instantiate Galera cluster. PCS 'galera' elects a master but other nodes remain in 'Slave' state forever and on those nodes MariaDB server is not started. PCS cluster does not complain about anything specific and says that all is good seemingly being happy.

Please feel free to ask for more specific logs/info you might need. thanks, L.

lejeczek avatar Dec 20 '21 11:12 lejeczek

Failing to start slaves without logging any error in pcs status might mean that the salve nodes fail to see that a master was started, but it hard to tell without any logs. Can you tell me the precise series of pcs commands you use to create and start the galera resource?

dciabrin avatar Jan 04 '22 08:01 dciabrin

Hi. -> $ pcs resource create mariadb ocf:heartbeat:galera cluster_host_map="c8kubernode1:10.0.1.1;c8kubernode2:10.0.1.2;c8kubernode3:10.0.1.3" wsrep_cluster_address="gcomm://10.0.1.1,10.0.1.2,10.0.1.3" log=/var/log/mariadb/mariadb.log user=mysql group=mysql check_user="pacemaker" check_passwd="pacemaker#SOMEPASS" additional_parameters="--basedir=/usr" op monitor OCF_CHECK_LEVEL="0" timeout="30s" interval="20s" op monitor role="Master" OCF_CHECK_LEVEL="0" timeout="30s" interval="10s" op monitor role="Slave" OCF_CHECK_LEVEL="0" timeout="30s" interval="30s" promotable meta failure-timeout=30s

One problem still appears to be SELinux - have not troubleshoot a custom module yet - quick VERY bad for testing only - disable SEL

So, such a resource gets me: -> $ pcs status --full ...

  • Clone Set: mariadb-clone [mariadb] (promotable):
    • mariadb (ocf::heartbeat:galera): Slave c8kubernode3
    • mariadb (ocf::heartbeat:galera): Master c8kubernode1
    • mariadb (ocf::heartbeat:galera): Slave c8kubernode2

which results in mysqld server/services running on MASTER but!... on slave nodes no, no mysqd processes.

Again, I disable the RESOURCE and can start Galera cluster outside of PCS with 'systemd', with '/etc/my.cnf.d/galera.cnf' staying the same, no changes.

thanks, L.

lejeczek avatar Jan 06 '22 19:01 lejeczek

Here is one node's galera.cnf just in case, pretty 'standard' I'd think:

[mysqld] binlog_format=ROW default-storage-engine=innodb innodb_autoinc_lock_mode=2 bind-address=0.0.0.0 wsrep_on=1 wsrep_provider=/usr/lib64/galera/libgalera_smm.so wsrep_cluster_name="websites" wsrep_cluster_address="gcomm://10.0.1.1,10.0.1.2,10.0.1.3" wsrep_node_address=10.0.1.1 wsrep_slave_threads=1 wsrep_certify_nonPK=1 wsrep_max_ws_rows=0 wsrep_max_ws_size=2147483647 wsrep_debug=0 wsrep_convert_LOCK_to_trx=0 wsrep_retry_autocommit=1 wsrep_auto_increment_control=1 wsrep_drupal_282555_workaround=0 wsrep_causal_reads=0 wsrep_notify_cmd= wsrep_sst_method=rsync wsrep_sst_auth=root:

lejeczek avatar Jan 07 '22 18:01 lejeczek

Ok I just noticed that you miss the promoted-max specifier in the instantiation command:

pcs resource create mariadb ocfheartbeatgalera cluster_host_map="c8kubernode1:10.0.1.1;c8kubernode2:10.0.1.2;c8kubernode3:10.0.1.3" wsrep_cluster_address="gcomm://10.0.1.1,10.0.1.2,10.0.1.3" log=/var/log/mariadb/mariadb.log user=mysql group=mysql check_user="pacemaker" check_passwd="pacemaker#SOMEPASS" additional_parameters="--basedir=/usr" op monitor OCF_CHECK_LEVEL="0" timeout="30s" interval="20s" op monitor role="Master" OCF_CHECK_LEVEL="0" timeout="30s" interval="10s" op monitor role="Slave" OCF_CHECK_LEVEL="0" timeout="30s" interval="30s" promotable promoted-max=3 meta failure-timeout=30s

I tested it locally and all nodes got promoted as expected

dciabrin avatar Jan 14 '22 13:01 dciabrin

Hi & thanks. I'd urge you(devel?) to include those in man pages(nothing better than a good example). Might I ask about SELinux - what do you do with that? On CentOSes I get plethora of denials(also silent ones) regards, L>

lejeczek avatar Jan 30 '22 09:01 lejeczek