zhmg23

我们是如此的不同

redis一主多从master_link_status:down恢复方法

redis一主多次方案用了很久了,一直没有出现过问题,但前几天,突然收到告警,其中一台从同步主的失败了,查看了一下从的info,出现master_link_status:down 

出现问题,一般肯定是要看一下日志,下面分别是redis从的日志,与redis主的日志:

redis从日志:

[11089] 22 Jun 10:26:44.657 * Master replied to PING, replication can continue...

[11089] 22 Jun 10:26:44.660 * Partial resynchronization not possible (no cached master)

[11089] 22 Jun 10:26:44.662 * Full resync from master: df72267e25d4f6e685e7c674381685aa93323feb:55527182232

[11089] 22 Jun 10:26:44.664 # MASTER aborted replication with an error: ERR Unable to perform background save


redis主的日志:

[1515] 22 Jun 10:33:04.644 # Can't rewrite append only file in background: fork: Cannot allocate memory

[1515] 22 Jun 10:33:04.656 * Slave 192.168.7.21:6379 asks for synchronization

[1515] 22 Jun 10:33:04.656 * Full resync requested by slave 192.168.7.21:6379

[1515] 22 Jun 10:33:04.656 * Starting BGSAVE for SYNC with target: disk

[1515] 22 Jun 10:33:04.657 # Can't save in background: fork: Cannot allocate memory

[1515] 22 Jun 10:33:04.657 * Replication failed, can't BGSAVE

[1515] 22 Jun 10:33:04.657 * Replication failed, can't BGSAVE

从日志上看,好像是bgsave命令不能执行导致的,但不能执行bgsave肯定是有原因的,经查询,得知需要优化vm.overcommit_memory参数,关于vm.overcommit_memory参数,有3个值:

0: 表示内核将检查是否有足够的可用内存供应用进程使用;如果有足够的可用内存,内存申请允许;否则,内存申请失败,并把错误返回给应用进程。

1: 表示内核允许分配所有的物理内存,而不管当前的内存状态如何。

2: 表示内核允许分配超过所有物理内存和交换空间总和的内存


下面有三种方式修改内核此参数,但要有root权限:

(1)编辑/etc/sysctl.conf ,改vm.overcommit_memory=1,然后sysctl -p 使配置文件生效

(2)sysctl vm.overcommit_memory=1

(3)echo 1 > /proc/sys/vm/overcommit_memory

修改完成后,切换至普通用户启动redis即可!

注意:这个问题,还需要把主的redis也重启一下


另外在解决此问题,还发现redis在启动时,还有2个告警日志:

1) # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.

解决办法:echo 2048 > /proc/sys/net/core/somaxconn  


2)[3898] 23 Jun 11:45:10.197 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. 

This will create latency and memory usage issues with Redis. 

To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.

解释:'Transparent Huge Pages (THP)'是一个使管理Huge Pages自动化的抽象层。目前好像问题挺多,很多的数据库产品都是要求(建议)关闭该功能的。 

解决:

echo never > /sys/kernel/mm/transparent_hugepage/enabled

评论