pika
pika copied to clipboard
重写配置时权限不足引起 pika 崩溃
问题说明
在通过 redis-cli 连接 pika 手动执行 config rewrite(redis sentinel 切换时也会调用)
时出现崩溃的现象, 如下所示:
127.0.0.1:9221> config rewrite
Could not connect to Redis at 127.0.0.1:9221: Connection refused
not connected>
程序日志则包含以下信息, 看起来是因为 /opt/pika/conf
目录的权限不足, 不能写文件而引起崩溃:
[INFO] (src/base_conf.cc:263) ret IO error: /opt/pika/conf/pika.conf.tmp: Permission denied
kernel: CliProcessorPoo[31014]: segfault at 0 ip 00000000006e68b0 sp 00007f08ca005250 error 4 in pika_debug[400000+ab3000]
对应的 strace 追踪也比较明显:
[pid 72116] 18:34:07.789705 open("/opt/pika/conf/pika.conf.tmp", O_RDWR|O_CREAT|O_TRUNC|O_CLOEXEC, 0644 <unfinished ...>
[pid 72115] 18:34:07.789758 futex(0x245a23c, FUTEX_WAIT_PRIVATE, 145, NULL <unfinished ...>
[pid 72113] 18:34:07.789785 futex(0x245a210, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 72116] 18:34:07.789819 <... open resumed>) = -1 EACCES (Permission denied) --> 权限不足
[pid 72113] 18:34:07.789861 <... futex resumed>) = 1
[pid 72112] 18:34:07.789884 <... futex resumed>) = 0
[pid 72116] 18:34:07.789934 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=NULL} ---
对应的 coredump 信息, slash::BaseConf::WriteBack
引起了崩溃:
(gdb) bt
#0 0x00000000006e68b0 in slash::BaseConf::WriteBack (this=0x3316000) at src/base_conf.cc:270
#1 0x0000000000539889 in PikaConf::ConfigRewrite (this=0x3316000) at /home/arstercz/pika-v3.4.0/src/pika_conf.cc:587
#2 0x0000000000623a4e in ConfigCmd::ConfigRewrite (this=0x4725400, ret="") at /home/arstercz/pika-v3.4.0/src/pika_admin.cc:1892
#3 0x000000000061b7d8 in ConfigCmd::Do (this=0x4725400, partition=std::shared_ptr (empty) 0x0) at /home/arstercz/pika-v3.4.0/src/pika_admin.cc:1161
#4 0x000000000044d253 in Cmd::ProcessDoNotSpecifyPartitionCmd (this=0x4725400) at /home/arstercz/pika-v3.4.0/src/pika_command.cc:772
#5 0x000000000044a932 in Cmd::Execute (this=0x4725400) at /home/arstercz/pika-v3.4.0/src/pika_command.cc:542
环境说明
Linux: Centos 6 or 7 x86_64
Pika: 3.4.0
如何复现
1. 普通用户启动 pika;
2. 普通用户没有 `pika/conf` 目录的写权限;
如何处理
可以保证足够的权限避免此类错误. 另外 pika 能否可以修改为写失败时提示错误, 而不是进程崩溃?
这个等slash合并到这个库里再修吧。