pika icon indicating copy to clipboard operation
pika copied to clipboard

pika-3.3.4宕机

Open kidd1985 opened this issue 5 years ago • 2 comments

版本: pika-3.3.4,使用sentinel 模式,4分片,其中3分片宕机

pika.INFO,pika.WARNING无日志。

只有dmesg日志: INFO: task rocksdb:bg0:253597 blocked for more than 120 seconds. Not tainted 2.6.32-573.el6.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. rocksdb:bg0 D 0000000000000008 0 253597 1 0x00000080 ffff8810535f7dc8 0000000000000082 0000000000000000 0000000000000001 000000000000000e ffff8810535f7de8 00188df4ea62ef0f ffffffff81127050 ffff883f7b8a6dd8 000000029bbd7eec ffff882009a9b068 ffff8810535f7fd8 Call Trace: [] ? find_get_pages_tag+0x40/0x130 [] jbd2_log_wait_commit+0xc5/0x140 [jbd2] [] ? autoremove_wake_function+0x0/0x40 [] jbd2_complete_transaction+0x68/0xb0 [jbd2] [] ext4_sync_file+0x121/0x1d0 [ext4] [] vfs_fsync_range+0xa1/0x100 [] vfs_fsync+0x1d/0x20 [] do_fsync+0x3e/0x60 [] sys_fdatasync+0x13/0x20 [] system_call_fastpath+0x16/0x1b INFO: task rocksdb:bg0:253599 blocked for more than 120 seconds. Not tainted 2.6.32-573.el6.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. rocksdb:bg0 D 0000000000000008 0 253599 1 0x00000080 ffff8810535ffdc8 0000000000000082 0000000000000000 0000000000000001 000000000000000e ffff8810535ffde8 00188df4dfb9bb98 ffffffff81127050 ffff88400eb72608 000000029bbd7e35 ffff8805a20f5068 ffff8810535fffd8 Call Trace: [] ? find_get_pages_tag+0x40/0x130 [] jbd2_log_wait_commit+0xc5/0x140 [jbd2] [] ? autoremove_wake_function+0x0/0x40 [] jbd2_complete_transaction+0x68/0xb0 [jbd2] [] ext4_sync_file+0x121/0x1d0 [ext4] [] vfs_fsync_range+0xa1/0x100 [] vfs_fsync+0x1d/0x20 [] do_fsync+0x3e/0x60 [] sys_fdatasync+0x13/0x20 [] system_call_fastpath+0x16/0x1b INFO: task redis-server:182327 blocked for more than 120 seconds. Not tainted 2.6.32-573.el6.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. redis-server D 0000000000000010 0 182327 4997 0x00000080 ffff8807aeb07dc8 0000000000000086 0000000000000000 0000000000000001 000000000000000e ffff8807aeb07de8 00188df491943bbb ffffffff81127050 ffff882b51043608 000000029bbd78f5 ffff881fe6b99ad8 ffff8807aeb07fd8 Call Trace: [] ? find_get_pages_tag+0x40/0x130 [] ? prepare_to_wait+0x4e/0x80 [] jbd2_log_wait_commit+0xc5/0x140 [jbd2] [] ? autoremove_wake_function+0x0/0x40 [] jbd2_complete_transaction+0x68/0xb0 [jbd2] [] ext4_sync_file+0x121/0x1d0 [ext4] [] vfs_fsync_range+0xa1/0x100 [] vfs_fsync+0x1d/0x20 [] do_fsync+0x3e/0x60 [] sys_fdatasync+0x13/0x20 [] system_call_fastpath+0x16/0x1b


其它信息: 另外zabbix,观察 cpu最高使用40% 磁盘使用约1T,剩余50% compact-cron : 02-04/30


机器配置: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 32 On-line CPU(s) list: 0-31 Thread(s) per core: 2 Core(s) per socket: 8 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 85 Stepping: 4 CPU MHz: 2094.604 BogoMIPS: 4187.98 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 1024K L3 cache: 11264K NUMA node0 CPU(s): 0-7,16-23 NUMA node1 CPU(s): 8-15,24-31


pika配置参数:

thread-num : 6 thread-pool-size : 12 sync-thread-num : 10 write-buffer-size : 268435456 timeout : 60 userblacklist : instance-mode : classic databases : 1 default-slot-num : 1024 replication-num : 0 consensus-level : 0 dump-prefix : dump-expire : 0 maxclients : 20000 target-file-size-base : 20971520 expire-logs-days : 7 expire-logs-nums : 100 root-connection-num : 2 slowlog-write-errorlog : no slowlog-log-slower-than : 10000 slowlog-max-len : 128

db sync speed(MB) max is set to 1024MB, min is set to 0, and if below 0 or above 1024, the value will be adjust to 1024

db-sync-speed : 50 slave-priority : 100 network-interface : eth2 compact-cron : 02-04/30 max-conn-rbuf-size : 268435456 write-binlog : yes binlog-file-size : 104857600 max-cache-statistic-keys : 0 small-compaction-threshold : 5000 max-write-buffer-size : 10737418240 max-client-response-size : 1073741824 compression : snappy max-background-flushes : 1 max-background-compactions : 2 max-cache-files : 5000 max-bytes-for-level-multiplier : 10 block-cache:268435456


另外问下,怎么设置 能看更多日志,debug级别日志怎么设?

第一次提issue,缺少什么信息,欢迎@

kidd1985 avatar Nov 04 '20 02:11 kidd1985

有没有pika coredump信息?sentinel内部使用了pub-sub命令,而pub-sub已知存在并发安全性问题(#965),可能导致coredump。

LIBA-S avatar Nov 04 '20 02:11 LIBA-S

有没有pika coredump信息?sentinel内部使用了pub-sub命令,而pub-sub已知存在并发安全性问题(#965),可能导致coredump。

您好,https://github.com/Qihoo360/pika/pull/797#issuecomment-571872846 这个现象和这个并发安全问题有关吗?

yz1509 avatar Feb 23 '21 07:02 yz1509

有没有pika coredump信息?sentinel内部使用了pub-sub命令,而pub-sub已知存在并发安全性问题(#965),可能导致coredump。

您好,#797 (comment) 这个现象和这个并发安全问题有关吗?

没关系,这个是rocksdb卡住了。不排除盘坏了

wanghenshui avatar Feb 21 '23 11:02 wanghenshui