本文共 37107 字,大约阅读时间需要 123 分钟。
IP | 角色 | 备注 | mha4mysql-node | mha4mysql-manager |
---|---|---|---|---|
192.168.98.11 | master | 读写 | √ | |
192.168.98.10 | slave | 只读 | √ | |
192.168.98.12 | slave | 只读 | √ | |
192.168.98.13 | manager节点 | N/A | √ | √ |
手动关闭一个从库192.168.98.10mysqld后尝试启动masterha_manager
/usr/local/bin/masterha_manager --global_conf=/etc/masterha/conf/masterha_default.cnf --conf=/etc/masterha/conf/cls_all.cnf
启动失败, 日志中有如下信息
Fri Feb 28 14:47:58 2020 - [info] MHA::MasterMonitor version 0.58.Fri Feb 28 14:47:59 2020 - [info] GTID failover mode = 1Fri Feb 28 14:47:59 2020 - [info] Dead Servers:Fri Feb 28 14:47:59 2020 - [info] 192.168.98.10(192.168.98.10:3306)Fri Feb 28 14:47:59 2020 - [info] Alive Servers:Fri Feb 28 14:47:59 2020 - [info] 192.168.98.11(192.168.98.11:3306)Fri Feb 28 14:47:59 2020 - [info] 192.168.98.12(192.168.98.12:3306)Fri Feb 28 14:47:59 2020 - [info] Alive Slaves:Fri Feb 28 14:47:59 2020 - [info] 192.168.98.12(192.168.98.12:3306) Version=5.7.29-32-log (oldest major version between slaves) log-bin:enabledFri Feb 28 14:47:59 2020 - [info] GTID ONFri Feb 28 14:47:59 2020 - [info] Replicating from 192.168.98.11(192.168.98.11:3306)Fri Feb 28 14:47:59 2020 - [info] Not candidate for the new Master (no_master is set)Fri Feb 28 14:47:59 2020 - [info] Current Alive Master: 192.168.98.11(192.168.98.11:3306)Fri Feb 28 14:47:59 2020 - [info] Checking slave configurations..Fri Feb 28 14:47:59 2020 - [info] Checking replication filtering settings..Fri Feb 28 14:47:59 2020 - [info] binlog_do_db= , binlog_ignore_db= Fri Feb 28 14:47:59 2020 - [info] Replication filtering check ok.Fri Feb 28 14:47:59 2020 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln364] None of slaves can be master. Check failover configuration file or log-bin settings in my.cnfFri Feb 28 14:47:59 2020 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations. at /usr/local/bin/masterha_manager line 50.Fri Feb 28 14:47:59 2020 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers.Fri Feb 28 14:47:59 2020 - [info] Got exit code 1 (Not master dead).
应该先使用masterha_check_repl
检查复制状态
#masterha_check_repl --conf=/etc/masterha/conf/cls_all.cnf --global_conf=/etc/masterha/conf/masterha_default.cnfFri Feb 28 15:27:24 2020 - [info] Reading default configuration from /etc/masterha/conf/masterha_default.cnf..Fri Feb 28 15:27:24 2020 - [info] Reading application default configuration from /etc/masterha/conf/cls_all.cnf..Fri Feb 28 15:27:24 2020 - [info] Reading server configuration from /etc/masterha/conf/cls_all.cnf..Fri Feb 28 15:27:24 2020 - [info] MHA::MasterMonitor version 0.58.Fri Feb 28 15:27:25 2020 - [info] GTID failover mode = 1Fri Feb 28 15:27:25 2020 - [info] Dead Servers:Fri Feb 28 15:27:25 2020 - [info] 192.168.98.10(192.168.98.10:3306)Fri Feb 28 15:27:25 2020 - [info] Alive Servers:Fri Feb 28 15:27:25 2020 - [info] 192.168.98.11(192.168.98.11:3306)Fri Feb 28 15:27:25 2020 - [info] 192.168.98.12(192.168.98.12:3306)Fri Feb 28 15:27:25 2020 - [info] Alive Slaves:Fri Feb 28 15:27:25 2020 - [info] 192.168.98.12(192.168.98.12:3306) Version=5.7.29-32-log (oldest major version between slaves) log-bin:enabledFri Feb 28 15:27:25 2020 - [info] GTID ONFri Feb 28 15:27:25 2020 - [info] Replicating from 192.168.98.11(192.168.98.11:3306)Fri Feb 28 15:27:25 2020 - [info] Not candidate for the new Master (no_master is set)Fri Feb 28 15:27:25 2020 - [info] Current Alive Master: 192.168.98.11(192.168.98.11:3306)Fri Feb 28 15:27:25 2020 - [info] Checking slave configurations..Fri Feb 28 15:27:25 2020 - [info] Checking replication filtering settings..Fri Feb 28 15:27:25 2020 - [info] binlog_do_db= , binlog_ignore_db= Fri Feb 28 15:27:25 2020 - [info] Replication filtering check ok.Fri Feb 28 15:27:25 2020 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln364] None of slaves can be master. Check failover configuration file or log-bin settings in my.cnfFri Feb 28 15:27:25 2020 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations. at /usr/local/bin/masterha_check_repl line 48.Fri Feb 28 15:27:25 2020 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers.Fri Feb 28 15:27:25 2020 - [info] Got exit code 1 (Not master dead).MySQL Replication Health is NOT OK!
在文档https://github.com/yoshinorim/mha4mysql-manager/wiki/masterha_manager中:
--ignore_fail_on_start
By default, master monitoring (not failover) process stops if one or more slaves are down, regardless of “ignore_fail” parameter setting. By setting --ignore_fail_on_start, master monitoring does not stop if ignore_fail marked slaves are down.
默认情况下,如果一个或多个从库宕机,则不管“ ignore_fail”参数设置如何,主服务器监视(非故障转移)过程都会停止。 通过设置–ignore_fail_on_start,如果标记为ignore_fail的从属服务器已关闭,则主监视不会停止。
这个意思就是说如果在配置文件中设置了为10设置了ignore_fail=1
, 那么再加上--ignore_fail_on_start
可以启动masterha_manager, 否则如果不在配置文件中指定ignore_fail=1
即使指定了--ignore_fail_on_start
也是不能启动的
加上ignore_fail=1
#cat /etc/masterha/conf/cls_all.cnf [server default]#workdir on the management servermanager_workdir=/masterha/cls_all/manager_log=/masterha/cls_all/manager.log#workdir on the node for mysql servermaster_binlog_dir=/data/mysql_3306/data/#自动故障VIP切换调用脚本master_ip_failover_script=/etc/masterha/scripts/master_ip_failover_vip --vip=192.168.98.100#手动故障切换调用脚本master_ip_online_change_script=/etc/masterha/scripts/master_ip_online_change_vip --vip=192.168.98.100#检测master的可用性secondary_check_script=masterha_secondary_check -s 192.168.98.11 -s 192.168.98.12[server1]hostname=192.168.98.10candidate_master=1ignore_fail=1[server2]hostname=192.168.98.11candidate_master=1[server3]hostname=192.168.98.12# no_master=1
启动成功
/usr/local/bin/masterha_manager --global_conf=/etc/masterha/conf/masterha_default.cnf --conf=/etc/masterha/conf/cls_all.cnf --ignore_fail_on_startFri Feb 28 15:59:37 2020 - [info] MHA::MasterMonitor version 0.58.Fri Feb 28 15:59:38 2020 - [info] GTID failover mode = 1Fri Feb 28 15:59:38 2020 - [info] Dead Servers:Fri Feb 28 15:59:38 2020 - [info] 192.168.98.10(192.168.98.10:3306)Fri Feb 28 15:59:38 2020 - [info] Alive Servers:Fri Feb 28 15:59:38 2020 - [info] 192.168.98.11(192.168.98.11:3306)Fri Feb 28 15:59:38 2020 - [info] 192.168.98.12(192.168.98.12:3306)Fri Feb 28 15:59:38 2020 - [info] Alive Slaves:Fri Feb 28 15:59:38 2020 - [info] 192.168.98.12(192.168.98.12:3306) Version=5.7.29-32-log (oldest major version between slaves) log-bin:enabledFri Feb 28 15:59:38 2020 - [info] GTID ONFri Feb 28 15:59:38 2020 - [info] Replicating from 192.168.98.11(192.168.98.11:3306)Fri Feb 28 15:59:38 2020 - [info] Current Alive Master: 192.168.98.11(192.168.98.11:3306)Fri Feb 28 15:59:38 2020 - [info] Checking slave configurations..Fri Feb 28 15:59:38 2020 - [info] Checking replication filtering settings..Fri Feb 28 15:59:38 2020 - [info] binlog_do_db= , binlog_ignore_db= Fri Feb 28 15:59:38 2020 - [info] Replication filtering check ok.Fri Feb 28 15:59:38 2020 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.Fri Feb 28 15:59:38 2020 - [info] Checking SSH publickey authentication settings on the current master..Fri Feb 28 15:59:39 2020 - [info] HealthCheck: SSH to 192.168.98.11 is reachable.Fri Feb 28 15:59:39 2020 - [info] 192.168.98.11(192.168.98.11:3306) (current master) +--192.168.98.12(192.168.98.12:3306)Fri Feb 28 15:59:39 2020 - [info] Checking master_ip_failover_script status:Fri Feb 28 15:59:39 2020 - [info] /etc/masterha/scripts/master_ip_failover_vip --vip=192.168.98.100 --command=status --ssh_user=root --orig_master_host=192.168.98.11 --orig_master_ip=192.168.98.11 --orig_master_port=3306 Fri Feb 28 15:59:39 2020 - [info] OK.Fri Feb 28 15:59:39 2020 - [warning] shutdown_script is not defined.Fri Feb 28 15:59:39 2020 - [info] Set master ping interval 3 seconds.Fri Feb 28 15:59:39 2020 - [info] Set secondary check script: masterha_secondary_check -s 192.168.98.11 -s 192.168.98.12Fri Feb 28 15:59:39 2020 - [info] Starting ping health check on 192.168.98.11(192.168.98.11:3306)..Fri Feb 28 15:59:39 2020 - [info] Ping(CONNECT) succeeded, waiting until MySQL doesn't respond..
不加
#cat /etc/masterha/conf/cls_all.cnf ...[server1]hostname=192.168.98.10candidate_master=1# ignore_fail=1[server2]hostname=192.168.98.11candidate_master=1[server3]hostname=192.168.98.12# no_master=1
启动失败
/usr/local/bin/masterha_manager --global_conf=/etc/masterha/conf/masterha_default.cnf --conf=/etc/masterha/conf/cls_all.cnf --ignore_fail_on_startFri Feb 28 15:58:57 2020 - [info] MHA::MasterMonitor version 0.58.Fri Feb 28 15:58:58 2020 - [info] GTID failover mode = 1Fri Feb 28 15:58:58 2020 - [info] Dead Servers:Fri Feb 28 15:58:58 2020 - [info] 192.168.98.10(192.168.98.10:3306)Fri Feb 28 15:58:58 2020 - [info] Alive Servers:Fri Feb 28 15:58:58 2020 - [info] 192.168.98.11(192.168.98.11:3306)Fri Feb 28 15:58:58 2020 - [info] 192.168.98.12(192.168.98.12:3306)Fri Feb 28 15:58:58 2020 - [info] Alive Slaves:Fri Feb 28 15:58:58 2020 - [info] 192.168.98.12(192.168.98.12:3306) Version=5.7.29-32-log (oldest major version between slaves) log-bin:enabledFri Feb 28 15:58:58 2020 - [info] GTID ONFri Feb 28 15:58:58 2020 - [info] Replicating from 192.168.98.11(192.168.98.11:3306)Fri Feb 28 15:58:58 2020 - [info] Current Alive Master: 192.168.98.11(192.168.98.11:3306)Fri Feb 28 15:58:58 2020 - [info] Checking slave configurations..Fri Feb 28 15:58:58 2020 - [info] Checking replication filtering settings..Fri Feb 28 15:58:58 2020 - [info] binlog_do_db= , binlog_ignore_db= Fri Feb 28 15:58:58 2020 - [info] Replication filtering check ok.Fri Feb 28 15:58:58 2020 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.Fri Feb 28 15:58:58 2020 - [error][/usr/local/share/perl5/MHA/ServerManager.pm, ln492] Server 192.168.98.10(192.168.98.10:3306) is dead, but must be alive! Check server settings.Fri Feb 28 15:58:58 2020 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations. at /usr/local/share/perl5/MHA/MasterMonitor.pm line 402.Fri Feb 28 15:58:58 2020 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers.Fri Feb 28 15:58:58 2020 - [info] Got exit code 1 (Not master dead).
另外如果加了ignore_fail=1 但是仅仅剩下的一个12指定了no_master=1
的话也无法启动
#cat /etc/masterha/conf/cls_all.cnf ...[server1]hostname=192.168.98.10candidate_master=1ignore_fail=1[server2]hostname=192.168.98.11candidate_master=1[server3]hostname=192.168.98.12no_master=1
None of slaves can be master
/usr/local/bin/masterha_manager --global_conf=/etc/masterha/conf/masterha_default.cnf --conf=/etc/masterha/conf/cls_all.cnf --ignore_fail_on_startFri Feb 28 15:55:14 2020 - [info] MHA::MasterMonitor version 0.58.Fri Feb 28 15:55:16 2020 - [info] GTID failover mode = 1Fri Feb 28 15:55:16 2020 - [info] Dead Servers:Fri Feb 28 15:55:16 2020 - [info] 192.168.98.10(192.168.98.10:3306)Fri Feb 28 15:55:16 2020 - [info] Alive Servers:Fri Feb 28 15:55:16 2020 - [info] 192.168.98.11(192.168.98.11:3306)Fri Feb 28 15:55:16 2020 - [info] 192.168.98.12(192.168.98.12:3306)Fri Feb 28 15:55:16 2020 - [info] Alive Slaves:Fri Feb 28 15:55:16 2020 - [info] 192.168.98.12(192.168.98.12:3306) Version=5.7.29-32-log (oldest major version between slaves) log-bin:enabledFri Feb 28 15:55:16 2020 - [info] GTID ONFri Feb 28 15:55:16 2020 - [info] Replicating from 192.168.98.11(192.168.98.11:3306)Fri Feb 28 15:55:16 2020 - [info] Not candidate for the new Master (no_master is set)Fri Feb 28 15:55:16 2020 - [info] Current Alive Master: 192.168.98.11(192.168.98.11:3306)Fri Feb 28 15:55:16 2020 - [info] Checking slave configurations..Fri Feb 28 15:55:16 2020 - [info] Checking replication filtering settings..Fri Feb 28 15:55:16 2020 - [info] binlog_do_db= , binlog_ignore_db= Fri Feb 28 15:55:16 2020 - [info] Replication filtering check ok.Fri Feb 28 15:55:16 2020 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln364] None of slaves can be master. Check failover configuration file or log-bin settings in my.cnfFri Feb 28 15:55:16 2020 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations. at /usr/local/bin/masterha_manager line 50.Fri Feb 28 15:55:16 2020 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers.Fri Feb 28 15:55:16 2020 - [info] Got exit code 1 (Not master dead).
如果masterha_manager运行中一个从库宕机, masterha_manager貌似无感知, 因为masterha_manager进程没有退出, 日志也没有报错
check_status仍然是正常的
#masterha_check_status --conf=/etc/masterha/conf/cls_all.cnf --global_conf=/etc/masterha/conf/masterha_default.cnfcls_all (pid:88464) is running(0:PING_OK), master:192.168.98.11
但是手动切换会失败
#/usr/local/bin/masterha_master_switch --global_conf=/etc/masterha/conf/masterha_default.cnf --conf=/etc/masterha/conf/cls_all.cnf --master_state=alive --new_master_host=192.168.98.12 --new_master_port=3306 --orig_master_is_new_slave --interactive=0Fri Feb 28 15:33:34 2020 - [info] MHA::MasterRotate version 0.58.Fri Feb 28 15:33:34 2020 - [info] Starting online master switch..Fri Feb 28 15:33:34 2020 - [info] Fri Feb 28 15:33:34 2020 - [info] * Phase 1: Configuration Check Phase..Fri Feb 28 15:33:34 2020 - [info] Fri Feb 28 15:33:34 2020 - [info] Reading default configuration from /etc/masterha/conf/masterha_default.cnf..Fri Feb 28 15:33:34 2020 - [info] Reading application default configuration from /etc/masterha/conf/cls_all.cnf..Fri Feb 28 15:33:34 2020 - [info] Reading server configuration from /etc/masterha/conf/cls_all.cnf..Fri Feb 28 15:33:35 2020 - [info] GTID failover mode = 1Fri Feb 28 15:33:35 2020 - [error][/usr/local/share/perl5/MHA/MasterRotate.pm, ln94] Switching master should not be started if one or more servers is down.Fri Feb 28 15:33:35 2020 - [info] Dead Servers:Fri Feb 28 15:33:35 2020 - [info] 192.168.98.10(192.168.98.10:3306)Fri Feb 28 15:33:35 2020 - [error][/usr/local/share/perl5/MHA/ManagerUtil.pm, ln177] Got ERROR: at /usr/local/bin/masterha_master_switch line 53.
Dead Servers:
会列出有问题的Server
如果在10还没修复时Master11挂了, 同时12设置了no_master, 自动failover会失败, 因为没有新的master可以用
#cat /etc/masterha/conf/cls_all.cnf ...[server1]hostname=192.168.98.10candidate_master=1ignore_fail=1[server2]hostname=192.168.98.11candidate_master=1[server3]hostname=192.168.98.12no_master=1
关闭11
Fri Feb 28 15:35:38 2020 - [warning] Got error on MySQL connect ping: DBI connect(';host=192.168.98.11;port=3306;mysql_connect_timeout=1','mha',...) failed: Can't connect to MySQL server on '192.168.98.11' (111) at /usr/local/share/perl5/MHA/HealthCheck.pm line 98.2003 (Can't connect to MySQL server on '192.168.98.11' (111))Fri Feb 28 15:35:38 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 192.168.98.11 -s 192.168.98.12 --user=root --master_host=192.168.98.11 --master_ip=192.168.98.11 --master_port=3306 --master_user=mha --master_password=mha --ping_type=CONNECTFri Feb 28 15:35:38 2020 - [info] Executing SSH check script: exit 0Fri Feb 28 15:35:39 2020 - [info] HealthCheck: SSH to 192.168.98.11 is reachable.Monitoring server 192.168.98.11 is reachable, Master is not reachable from 192.168.98.11. OK.Monitoring server 192.168.98.12 is reachable, Master is not reachable from 192.168.98.12. OK.Fri Feb 28 15:35:40 2020 - [info] Master is not reachable from all other monitoring servers. Failover should start.Fri Feb 28 15:35:41 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.98.11' (111))Fri Feb 28 15:35:41 2020 - [warning] Connection failed 2 time(s)..Fri Feb 28 15:35:44 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.98.11' (111))Fri Feb 28 15:35:44 2020 - [warning] Connection failed 3 time(s)..Fri Feb 28 15:35:47 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.98.11' (111))Fri Feb 28 15:35:47 2020 - [warning] Connection failed 4 time(s)..Fri Feb 28 15:35:47 2020 - [warning] Master is not reachable from health checker!Fri Feb 28 15:35:47 2020 - [warning] Master 192.168.98.11(192.168.98.11:3306) is not reachable!Fri Feb 28 15:35:47 2020 - [warning] SSH is reachable.Fri Feb 28 15:35:47 2020 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha/conf/masterha_default.cnf and /etc/masterha/conf/cls_all.cnf again, and trying to connect to all servers to check server status..Fri Feb 28 15:35:47 2020 - [info] Reading default configuration from /etc/masterha/conf/masterha_default.cnf..Fri Feb 28 15:35:47 2020 - [info] Reading application default configuration from /etc/masterha/conf/cls_all.cnf..Fri Feb 28 15:35:47 2020 - [info] Reading server configuration from /etc/masterha/conf/cls_all.cnf..Fri Feb 28 15:35:48 2020 - [info] GTID failover mode = 1Fri Feb 28 15:35:48 2020 - [info] Dead Servers:Fri Feb 28 15:35:48 2020 - [info] 192.168.98.10(192.168.98.10:3306)Fri Feb 28 15:35:48 2020 - [info] 192.168.98.11(192.168.98.11:3306)Fri Feb 28 15:35:48 2020 - [info] Alive Servers:Fri Feb 28 15:35:48 2020 - [info] 192.168.98.12(192.168.98.12:3306)Fri Feb 28 15:35:48 2020 - [info] Alive Slaves:Fri Feb 28 15:35:48 2020 - [info] 192.168.98.12(192.168.98.12:3306) Version=5.7.29-32-log (oldest major version between slaves) log-bin:enabledFri Feb 28 15:35:48 2020 - [info] GTID ONFri Feb 28 15:35:48 2020 - [info] Replicating from 192.168.98.11(192.168.98.11:3306)Fri Feb 28 15:35:48 2020 - [info] Not candidate for the new Master (no_master is set)Fri Feb 28 15:35:48 2020 - [info] Checking slave configurations..Fri Feb 28 15:35:48 2020 - [info] Checking replication filtering settings..Fri Feb 28 15:35:48 2020 - [info] Replication filtering check ok.Fri Feb 28 15:35:48 2020 - [info] Master is down!Fri Feb 28 15:35:48 2020 - [info] Terminating monitoring script.Fri Feb 28 15:35:48 2020 - [info] Got exit code 20 (Master dead).Fri Feb 28 15:35:48 2020 - [info] MHA::MasterFailover version 0.58.Fri Feb 28 15:35:48 2020 - [info] Starting master failover.Fri Feb 28 15:35:48 2020 - [info] Fri Feb 28 15:35:48 2020 - [info] * Phase 1: Configuration Check Phase..Fri Feb 28 15:35:48 2020 - [info] Fri Feb 28 15:35:49 2020 - [info] GTID failover mode = 1Fri Feb 28 15:35:49 2020 - [info] Dead Servers:Fri Feb 28 15:35:49 2020 - [info] 192.168.98.10(192.168.98.10:3306)Fri Feb 28 15:35:49 2020 - [info] 192.168.98.11(192.168.98.11:3306)Fri Feb 28 15:35:49 2020 - [info] Checking master reachability via MySQL(double check)...Fri Feb 28 15:35:49 2020 - [info] ok.Fri Feb 28 15:35:49 2020 - [info] Alive Servers:Fri Feb 28 15:35:49 2020 - [info] 192.168.98.12(192.168.98.12:3306)Fri Feb 28 15:35:49 2020 - [info] Alive Slaves:Fri Feb 28 15:35:49 2020 - [info] 192.168.98.12(192.168.98.12:3306) Version=5.7.29-32-log (oldest major version between slaves) log-bin:enabledFri Feb 28 15:35:49 2020 - [info] GTID ONFri Feb 28 15:35:49 2020 - [info] Replicating from 192.168.98.11(192.168.98.11:3306)Fri Feb 28 15:35:49 2020 - [info] Not candidate for the new Master (no_master is set)Fri Feb 28 15:35:49 2020 - [error][/usr/local/share/perl5/MHA/ServerManager.pm, ln492] Server 192.168.98.10(192.168.98.10:3306) is dead, but must be alive! Check server settings.Fri Feb 28 15:35:49 2020 - [error][/usr/local/share/perl5/MHA/ManagerUtil.pm, ln177] Got ERROR: at /usr/local/share/perl5/MHA/MasterFailover.pm line 269.
主要问题在
Not candidate for the new Master (no_master is set)Server 192.168.98.10(192.168.98.10:3306) is dead, but must be alive! Check server settings
vip还正在原Master11上
root@localhost 14:40:38 [(none)]> \! ip a1: lo:mtu 65536 qdisc noqueue state UNKNOWN qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever2: ens33: mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 00:0c:29:98:28:0b brd ff:ff:ff:ff:ff:ff inet 192.168.98.11/24 brd 192.168.98.255 scope global ens33 valid_lft forever preferred_lft forever inet 192.168.98.100/24 scope global secondary ens33 valid_lft forever preferred_lft forever inet6 fe80::cd5b:e71c:7a67:b391/64 scope link valid_lft forever preferred_lft foreverroot@localhost 15:35:04 [(none)]> shutdown;Query OK, 0 rows affected (0.00 sec)root@localhost 15:35:37 [(none)]> 2020-02-28T07:35:50.083534Z mysqld_safe mysqld from pid file /data/mysql_3306/run/mysql.pid endedroot@localhost 15:36:40 [(none)]> \! ip a1: lo: mtu 65536 qdisc noqueue state UNKNOWN qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever2: ens33: mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 00:0c:29:98:28:0b brd ff:ff:ff:ff:ff:ff inet 192.168.98.11/24 brd 192.168.98.255 scope global ens33 valid_lft forever preferred_lft forever inet 192.168.98.100/24 scope global secondary ens33 valid_lft forever preferred_lft forever inet6 fe80::cd5b:e71c:7a67:b391/64 scope link valid_lft forever preferred_lft forever
12仍然是从库, 且没有vip
root@localhost 15:35:32 [(none)]> show slave status\G*************************** 1. row *************************** Slave_IO_State: Reconnecting after a failed master event read Master_Host: 192.168.98.11 Master_User: repler Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000001 Read_Master_Log_Pos: 2496 Relay_Log_File: mysql-relay-bin.000002 Relay_Log_Pos: 1354 Relay_Master_Log_File: mysql-bin.000001 Slave_IO_Running: Connecting Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 2496 Relay_Log_Space: 1561 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULLMaster_SSL_Verify_Server_Cert: No Last_IO_Errno: 2003 Last_IO_Error: error reconnecting to master 'repler@192.168.98.11:3306' - retry-time: 60 retries: 1 Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 98113306 Master_UUID: 68703597-592c-11ea-88b3-000c2998280b Master_Info_File: mysql.slave_master_info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: 200228 15:35:45 Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: 68703597-592c-11ea-88b3-000c2998280b:1-4 Executed_Gtid_Set: 3a60f8c7-592c-11ea-8cb1-000c2973aaf0:1-6,68703597-592c-11ea-88b3-000c2998280b:1-4 Auto_Position: 1 Replicate_Rewrite_DB: Channel_Name: Master_TLS_Version: 1 row in set (0.00 sec)root@localhost 15:36:32 [(none)]> \! ip a1: lo:mtu 65536 qdisc noqueue state UNKNOWN qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever2: ens33: mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 00:0c:29:96:c2:3a brd ff:ff:ff:ff:ff:ff inet 192.168.98.12/24 brd 192.168.98.255 scope global ens33 valid_lft forever preferred_lft forever inet6 fe80::ef03:3251:b4ed:204c/64 scope link valid_lft forever preferred_lft foreverroot@localhost 15:36:37 [(none)]>
如果有候选master, 也就是12没有加no_master=1
是可以自动failover的
Fri Feb 28 16:16:27 2020 - [warning] Got error on MySQL connect ping: DBI connect(';host=192.168.98.11;port=3306;mysql_connect_timeout=1','mha',...) failed: Can't connect to MySQL server on '192.168.98.11' (111) at /usr/local/share/perl5/MHA/HealthCheck.pm line 98.2003 (Can't connect to MySQL server on '192.168.98.11' (111))Fri Feb 28 16:16:27 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 192.168.98.11 -s 192.168.98.12 --user=root --master_host=192.168.98.11 --master_ip=192.168.98.11 --master_port=3306 --master_user=mha --master_password=mha --ping_type=CONNECTFri Feb 28 16:16:27 2020 - [info] Executing SSH check script: exit 0Fri Feb 28 16:16:28 2020 - [info] HealthCheck: SSH to 192.168.98.11 is reachable.Monitoring server 192.168.98.11 is reachable, Master is not reachable from 192.168.98.11. OK.Monitoring server 192.168.98.12 is reachable, Master is not reachable from 192.168.98.12. OK.Fri Feb 28 16:16:28 2020 - [info] Master is not reachable from all other monitoring servers. Failover should start.Fri Feb 28 16:16:30 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.98.11' (111))Fri Feb 28 16:16:30 2020 - [warning] Connection failed 2 time(s)..Fri Feb 28 16:16:33 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.98.11' (111))Fri Feb 28 16:16:33 2020 - [warning] Connection failed 3 time(s)..Fri Feb 28 16:16:36 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.98.11' (111))Fri Feb 28 16:16:36 2020 - [warning] Connection failed 4 time(s)..Fri Feb 28 16:16:36 2020 - [warning] Master is not reachable from health checker!Fri Feb 28 16:16:36 2020 - [warning] Master 192.168.98.11(192.168.98.11:3306) is not reachable!Fri Feb 28 16:16:36 2020 - [warning] SSH is reachable.Fri Feb 28 16:16:36 2020 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha/conf/masterha_default.cnf and /etc/masterha/conf/cls_all.cnf again, and trying to connect to all servers to check server status..Fri Feb 28 16:16:36 2020 - [info] Reading default configuration from /etc/masterha/conf/masterha_default.cnf..Fri Feb 28 16:16:36 2020 - [info] Reading application default configuration from /etc/masterha/conf/cls_all.cnf..Fri Feb 28 16:16:36 2020 - [info] Reading server configuration from /etc/masterha/conf/cls_all.cnf..Fri Feb 28 16:16:37 2020 - [info] GTID failover mode = 1Fri Feb 28 16:16:37 2020 - [info] Dead Servers:Fri Feb 28 16:16:37 2020 - [info] 192.168.98.10(192.168.98.10:3306)Fri Feb 28 16:16:37 2020 - [info] 192.168.98.11(192.168.98.11:3306)Fri Feb 28 16:16:37 2020 - [info] Alive Servers:Fri Feb 28 16:16:37 2020 - [info] 192.168.98.12(192.168.98.12:3306)Fri Feb 28 16:16:37 2020 - [info] Alive Slaves:Fri Feb 28 16:16:37 2020 - [info] 192.168.98.12(192.168.98.12:3306) Version=5.7.29-32-log (oldest major version between slaves) log-bin:enabledFri Feb 28 16:16:37 2020 - [info] GTID ONFri Feb 28 16:16:37 2020 - [info] Replicating from 192.168.98.11(192.168.98.11:3306)Fri Feb 28 16:16:37 2020 - [info] Checking slave configurations..Fri Feb 28 16:16:37 2020 - [info] Checking replication filtering settings..Fri Feb 28 16:16:37 2020 - [info] Replication filtering check ok.Fri Feb 28 16:16:37 2020 - [info] Master is down!Fri Feb 28 16:16:37 2020 - [info] Terminating monitoring script.Fri Feb 28 16:16:37 2020 - [info] Got exit code 20 (Master dead).Fri Feb 28 16:16:37 2020 - [info] MHA::MasterFailover version 0.58.Fri Feb 28 16:16:37 2020 - [info] Starting master failover.Fri Feb 28 16:16:37 2020 - [info] Fri Feb 28 16:16:37 2020 - [info] * Phase 1: Configuration Check Phase..Fri Feb 28 16:16:37 2020 - [info] Fri Feb 28 16:16:38 2020 - [info] GTID failover mode = 1Fri Feb 28 16:16:38 2020 - [info] Dead Servers:Fri Feb 28 16:16:38 2020 - [info] 192.168.98.10(192.168.98.10:3306)Fri Feb 28 16:16:38 2020 - [info] 192.168.98.11(192.168.98.11:3306)Fri Feb 28 16:16:38 2020 - [info] Checking master reachability via MySQL(double check)...Fri Feb 28 16:16:38 2020 - [info] ok.Fri Feb 28 16:16:38 2020 - [info] Alive Servers:Fri Feb 28 16:16:38 2020 - [info] 192.168.98.12(192.168.98.12:3306)Fri Feb 28 16:16:38 2020 - [info] Alive Slaves:Fri Feb 28 16:16:38 2020 - [info] 192.168.98.12(192.168.98.12:3306) Version=5.7.29-32-log (oldest major version between slaves) log-bin:enabledFri Feb 28 16:16:38 2020 - [info] GTID ONFri Feb 28 16:16:38 2020 - [info] Replicating from 192.168.98.11(192.168.98.11:3306)Fri Feb 28 16:16:38 2020 - [info] Starting GTID based failover.Fri Feb 28 16:16:38 2020 - [info] Fri Feb 28 16:16:38 2020 - [info] ** Phase 1: Configuration Check Phase completed.Fri Feb 28 16:16:38 2020 - [info] Fri Feb 28 16:16:38 2020 - [info] * Phase 2: Dead Master Shutdown Phase..Fri Feb 28 16:16:38 2020 - [info] Fri Feb 28 16:16:38 2020 - [info] Forcing shutdown so that applications never connect to the current master..Fri Feb 28 16:16:38 2020 - [info] Executing master IP deactivation script:Fri Feb 28 16:16:38 2020 - [info] /etc/masterha/scripts/master_ip_failover_vip --vip=192.168.98.100 --orig_master_host=192.168.98.11 --orig_master_ip=192.168.98.11 --orig_master_port=3306 --command=stopssh --ssh_user=root Disabling the VIP on old master: 192.168.98.11 Fri Feb 28 16:16:39 2020 - [info] done.Fri Feb 28 16:16:39 2020 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.Fri Feb 28 16:16:39 2020 - [info] * Phase 2: Dead Master Shutdown Phase completed.Fri Feb 28 16:16:39 2020 - [info] Fri Feb 28 16:16:39 2020 - [info] * Phase 3: Master Recovery Phase..Fri Feb 28 16:16:39 2020 - [info] Fri Feb 28 16:16:39 2020 - [info] * Phase 3.1: Getting Latest Slaves Phase..Fri Feb 28 16:16:39 2020 - [info] Fri Feb 28 16:16:39 2020 - [info] The latest binary log file/position on all slaves is mysql-bin.000002:234Fri Feb 28 16:16:39 2020 - [info] Retrieved Gtid Set: 68703597-592c-11ea-88b3-000c2998280b:1-4Fri Feb 28 16:16:39 2020 - [info] Latest slaves (Slaves that received relay log files to the latest):Fri Feb 28 16:16:39 2020 - [info] 192.168.98.12(192.168.98.12:3306) Version=5.7.29-32-log (oldest major version between slaves) log-bin:enabledFri Feb 28 16:16:39 2020 - [info] GTID ONFri Feb 28 16:16:39 2020 - [info] Replicating from 192.168.98.11(192.168.98.11:3306)Fri Feb 28 16:16:39 2020 - [info] The oldest binary log file/position on all slaves is mysql-bin.000002:234Fri Feb 28 16:16:39 2020 - [info] Retrieved Gtid Set: 68703597-592c-11ea-88b3-000c2998280b:1-4Fri Feb 28 16:16:39 2020 - [info] Oldest slaves:Fri Feb 28 16:16:39 2020 - [info] 192.168.98.12(192.168.98.12:3306) Version=5.7.29-32-log (oldest major version between slaves) log-bin:enabledFri Feb 28 16:16:39 2020 - [info] GTID ONFri Feb 28 16:16:39 2020 - [info] Replicating from 192.168.98.11(192.168.98.11:3306)Fri Feb 28 16:16:39 2020 - [info] Fri Feb 28 16:16:39 2020 - [info] * Phase 3.3: Determining New Master Phase..Fri Feb 28 16:16:39 2020 - [info] Fri Feb 28 16:16:39 2020 - [info] Searching new master from slaves..Fri Feb 28 16:16:39 2020 - [info] Candidate masters from the configuration file:Fri Feb 28 16:16:39 2020 - [info] Non-candidate masters:Fri Feb 28 16:16:39 2020 - [info] New master is 192.168.98.12(192.168.98.12:3306)Fri Feb 28 16:16:39 2020 - [info] Starting master failover..Fri Feb 28 16:16:39 2020 - [info] From:192.168.98.11(192.168.98.11:3306) (current master) +--192.168.98.12(192.168.98.12:3306)To:192.168.98.12(192.168.98.12:3306) (new master)Fri Feb 28 16:16:39 2020 - [info] Fri Feb 28 16:16:39 2020 - [info] * Phase 3.3: New Master Recovery Phase..Fri Feb 28 16:16:39 2020 - [info] Fri Feb 28 16:16:39 2020 - [info] Waiting all logs to be applied.. Fri Feb 28 16:16:39 2020 - [info] done.Fri Feb 28 16:16:39 2020 - [info] Getting new master's binlog name and position..Fri Feb 28 16:16:39 2020 - [info] mysql-bin.000001:2496Fri Feb 28 16:16:39 2020 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.98.12', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repler', MASTER_PASSWORD='xxx';Fri Feb 28 16:16:39 2020 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: mysql-bin.000001, 2496, 3a60f8c7-592c-11ea-8cb1-000c2973aaf0:1-6,68703597-592c-11ea-88b3-000c2998280b:1-4Fri Feb 28 16:16:39 2020 - [info] Executing master IP activate script:Fri Feb 28 16:16:39 2020 - [info] /etc/masterha/scripts/master_ip_failover_vip --vip=192.168.98.100 --command=start --ssh_user=root --orig_master_host=192.168.98.11 --orig_master_ip=192.168.98.11 --orig_master_port=3306 --new_master_host=192.168.98.12 --new_master_ip=192.168.98.12 --new_master_port=3306 --new_master_user='mha' --new_master_password=xxxEnabling the VIP - 192.168.98.100 on the new master - 192.168.98.12 Set read_only=0 on the new master.Creating app user on the new master..Fri Feb 28 16:16:39 2020 - [info] OK.Fri Feb 28 16:16:39 2020 - [info] ** Finished master recovery successfully.Fri Feb 28 16:16:39 2020 - [info] * Phase 3: Master Recovery Phase completed.Fri Feb 28 16:16:39 2020 - [info] Fri Feb 28 16:16:39 2020 - [info] * Phase 4: Slaves Recovery Phase..Fri Feb 28 16:16:39 2020 - [info] Fri Feb 28 16:16:39 2020 - [info] Fri Feb 28 16:16:39 2020 - [info] * Phase 4.1: Starting Slaves in parallel..Fri Feb 28 16:16:39 2020 - [info] Fri Feb 28 16:16:39 2020 - [info] All new slave servers recovered successfully.Fri Feb 28 16:16:39 2020 - [info] Fri Feb 28 16:16:39 2020 - [info] * Phase 5: New master cleanup phase..Fri Feb 28 16:16:39 2020 - [info] Fri Feb 28 16:16:39 2020 - [info] Resetting slave info on the new master..Fri Feb 28 16:16:39 2020 - [info] 192.168.98.12: Resetting slave info succeeded.Fri Feb 28 16:16:39 2020 - [error][/usr/local/share/perl5/MHA/MasterFailover.pm, ln2045] Master failover to 192.168.98.12(192.168.98.12:3306) done, but recovery on slave partially failed.Fri Feb 28 16:16:39 2020 - [info] ----- Failover Report -----cls_all: MySQL Master failover 192.168.98.11(192.168.98.11:3306) to 192.168.98.12(192.168.98.12:3306)Master 192.168.98.11(192.168.98.11:3306) is down!Check MHA Manager logs at localhost.localdomain:/masterha/cls_all/manager.log for details.Started automated(non-interactive) failover.Invalidated master IP address on 192.168.98.11(192.168.98.11:3306)Selected 192.168.98.12(192.168.98.12:3306) as a new master.192.168.98.12(192.168.98.12:3306): OK: Applying all logs succeeded.192.168.98.12(192.168.98.12:3306): OK: Activated master IP address.192.168.98.12(192.168.98.12:3306): Resetting slave info succeeded.192.168.98.10(192.168.98.10:3306): ERROR: Could not be reachable so couldn't recover.Master failover to 192.168.98.12(192.168.98.12:3306) done, but recovery on slave partially failed.Fri Feb 28 16:16:39 2020 - [info] Sending mail..sh: /etc/masterha/scripts/send_report: No such file or directoryFri Feb 28 16:16:39 2020 - [error][/usr/local/share/perl5/MHA/MasterFailover.pm, ln2089] Failed to send mail with return code 127:0
只不过由于10无法连通, recover on slave partially failed
192.168.98.10(192.168.98.10:3306): ERROR: Could not be reachable so couldn't recover.Master failover to 192.168.98.12(192.168.98.12:3306) done, but recovery on slave partially failed.
不过failover成功, vip已经到了12上
root@localhost 16:16:16 [(none)]> \! ip a1: lo:mtu 65536 qdisc noqueue state UNKNOWN qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever2: ens33: mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 00:0c:29:96:c2:3a brd ff:ff:ff:ff:ff:ff inet 192.168.98.12/24 brd 192.168.98.255 scope global ens33 valid_lft forever preferred_lft forever inet 192.168.98.100/24 scope global secondary ens33 valid_lft forever preferred_lft forever inet6 fe80::ef03:3251:b4ed:204c/64 scope link valid_lft forever preferred_lft foreverroot@localhost 16:27:37 [(none)]> show slave status\GEmpty set (0.00 sec)root@localhost 16:27:43 [(none)]> show global variables like '%read_only%';+-----------------------+-------+| Variable_name | Value |+-----------------------+-------+| innodb_read_only | OFF || read_only | OFF || super_read_only | OFF || transaction_read_only | OFF || tx_read_only | OFF |+-----------------------+-------+5 rows in set (0.00 sec)
转载地址:http://ckvub.baihongyu.com/