How to troubleshoot Galera cluster joiner node service

There are some situations in Galera cluster that joiner node fails to start and in this article we want to show How to troubleshoot Galera cluster joiner node service.
If you want to know how to setup Galera cluster, please refer to article Database replication with mariadb on CentOS 7 linux.

1- Time and Date

If time and date between joiner and donor nodes is different, we must correct it. the best practice is to use NTP servers.

2- Database size

In some situations where joiner node wants to join cluster, due to data size, this operation may gets failed. consider the following situation:
DB size: 3 GB
we start first node by issuing:

galera_new_cluster

while we want to start second node by issuing:

systemctl start mariadb

due to large db size, transfer may take a while. so in order to prevent service start operation failure, we have to increase timeout in mariadb service on second node.

vim /lib/systemd/system/mariadb.service

then put the following line under “Service” section:

TimeoutSec=600

and reload systemctl daemon:

systemctl daemon-reload

also consider to update rsync package to the latest version:

yum update rsync

3- Donor node options

In donor node, if we put sst-log-archive=1 in /etc/my.cnf.d/server.conf file, every time a new node wants to join the cluster, the donor node tries to rename sst log file.
For example if the log file is like the following:

/var/lib/mysql/mariabackup.backup.log

it tries to rename it to another name such as:

/var/log/mysql/mariabackup.backup.log.2021.05.05-13.21.09.599455704.

If there be an extra “/” in the destination path, such as:

/var/log/mysql//mariabackup.backup.log.2021.05.05-13.21.09.599455704

or in the source path such as

/var/lib/mysql//mariabackup.backup.log

mariadb will be unable to rename the file and shows an error like the following:

mv: cannot move ‘/var/lib/mysql//mariabackup.backup.log’ to ‘/var/log/mysql//mariabackup.backup.log.2021.05.05-13.21.09.599455704’: Permission denied

in such situation the easiest way is to set sst-log-archive=0 and then shutdown all cluster and start it again.

4- Cluster statement

When donor node start, its state in database should be “Synced” and NOT be equal to “Initialized”.
we check cluster statement by issuing the following command:

mysql -u root -p

then enter password and run this command:

SHOW STATUS LIKE 'wsrep_local_state_comment';
+---------------------------+------------+
| Variable_name             |    Value   |
+---------------------------+------------+
| wsrep_local_state_comment | Initialized|
+---------------------------+------------+

Here we have to shutdown cluster and start it again by issuing:

galera_new_cluster

If we run previous command again we get desired output:

SHOW STATUS LIKE 'wsrep_local_state_comment';
+---------------------------+--------+
| Variable_name             | Value  |
+---------------------------+--------+
| wsrep_local_state_comment | Synced |
+---------------------------+--------+