大压力下造成braft server段错误(Segmentation fault)
1. 使用counter的例子。 2. run_server端将sync 'false' 置为false。 线程设置为128。 DEFINE_string sync 'false' 'fsync each time' server运行参数为:-bthread_concurrency=128 -crash_on_fatal_log=true -raft_max_segment_size=8388608 -raft_sync=false -port=8100 -conf=127.0.0.1:8100:0, 3. 客户端执行线程数改为50,没有sleep直接满压运行。 客户端运行参数为:./counter_client --add_percentage=100 --bthread_concurrency=100 --conf=127.0.0.1:8100:0, --crash_on_fatal_log=true --log_each_request=false --thread_num=50 --use_bthread=true --timeout_ms=1000
运行后 qps可达12w。但运行几秒钟后server挂掉。(理应变慢或无法响应,不应挂掉)
客户端运行结果:
I0705 12:00:32.577170 22770 /home/ys/braft-1.1.2/example/counter/client.cpp:178] Sending Request to Counter (127.0.0.1:8100:0,) at qps=119160 latency=417 I0705 12:00:33.577314 22770 /home/ys/braft-1.1.2/example/counter/client.cpp:178] Sending Request to Counter (127.0.0.1:8100:0,) at qps=116432 latency=427 I0705 12:00:34.577444 22770 /home/ys/braft-1.1.2/example/counter/client.cpp:178] Sending Request to Counter (127.0.0.1:8100:0,) at qps=119674 latency=415 I0705 12:00:35.577571 22770 /home/ys/braft-1.1.2/example/counter/client.cpp:178] Sending Request to Counter (127.0.0.1:8100:0,) at qps=118686 latency=419 I0705 12:00:36.577692 22770 /home/ys/braft-1.1.2/example/counter/client.cpp:178] Sending Request to Counter (127.0.0.1:8100:0,) at qps=120076 latency=414 I0705 12:00:37.577862 22770 /home/ys/braft-1.1.2/example/counter/client.cpp:178] Sending Request to Counter (127.0.0.1:8100:0,) at qps=62620 latency=417 W0705 12:00:38.080393 22871 /home/ys/braft-1.1.2/example/counter/client.cpp:86] Fail to send request to 127.0.0.1:8100:0 : [E1008]Reached timeout=1000ms @127.0.0.1:8100 W0705 12:00:38.080418 22826 /home/ys/braft-1.1.2/example/counter/client.cpp:86] Fail to send request to 127.0.0.1:8100:0 : [E1008]Reached timeout=1000ms @127.0.0.1:8100 W0705 12:00:38.080410 22839 /home/ys/braft-1.1.2/example/counter/client.cpp:86] Fail to send request to 127.0.0.1:8100:0 : [E1008]Reached timeout=1000ms @127.0.0.1:8100 W0705 12:00:38.080488 22807 /home/ys/braft-1.1.2/example/counter/client.cpp:86] Fail to send request to 127.0.0.1:8100:0 : [E1008]Reached timeout=1000ms @127.0.0.1:8100 W0705 12:00:38.080434 22784 /home/ys/braft-1.1.2/example/counter/client.cpp:86] Fail to send request to 127.0.0.1:8100:0 : [E1008]Reached timeout=1000ms @127.0.0.1:8100
前半部分还正常,后半部分server,用gdb调试已经出现segmentation fault.
Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fffe5110700 (LWP 21973)] 0x000000000072d8aa in brpc::policy::ProcessRpcRequest (msg_base=0x5295e80) at src/brpc/policy/baidu_rpc_protocol.cpp:485 485 src/brpc/policy/baidu_rpc_protocol.cpp: 没有那个文件或目录. Missing separate debuginfos, use: debuginfo-install gflags-2.1.1-6.el7.x86_64 glibc-2.17-326.el7_9.x86_64 gperftools-libs-2.6.1-1.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-51.el7_9.x86_64 leveldb-1.12.0-11.el7.x86_64 libcom_err-1.42.9-19.el7.x86_64 libgcc-4.8.5-44.el7.x86_64 libselinux-2.5-15.el7.x86_64 libstdc++-4.8.5-44.el7.x86_64 openssl-libs-1.0.2k-25.el7_9.x86_64 pcre-8.32-17.el7.x86_64 protobuf-2.5.0-8.el7.x86_64 snappy-1.1.0-3.el7.x86_64 zlib-1.2.7-20.el7_9.x86_64
堆栈如下:
(gdb) bt
#0 0x000000000072d8aa in brpc::policy::ProcessRpcRequest (msg_base=0x5295e80)
at src/brpc/policy/baidu_rpc_protocol.cpp:485
#1 0x00000000006a181a in brpc::ProcessInputMessage (void_arg=void_arg@entry=0x5295e80)
at src/brpc/input_messenger.cpp:147
#2 0x00000000006a25c3 in operator() (this=
换设备后无此问题,是设备问题