bifromq
bifromq copied to clipboard
长时间压测后,线程wal-raft-executor-112680774442680320_0 和 basekv-range-mutator CPU高,一直降不下来
长时间压测后,线程wal-raft-executor-112680774442680320_0 CPU高,一直降不下来
集群3个节点(32C,64G)(20,54,124 三台),35w客户端,每隔10s发 40K body 压测,每隔10-12小时休眠 3分钟左右。大概2天后,20节点 wal-raft-executor-112680774442680320_0 线程 CPU 占用高,54 节点上 wal-raft-executor-112680774434029568_0 线程 CPU 占用高,而且一直降不下来,同时 basekv-range-mutator 线程 CPU也很高而且无法将来下。
但这期间集群正常,warn.log 和 error.log 都没有错误打印, gc 日志正常。balancer 日志中能搜到该线程
20 节点 cpu 截图
20 节点 retain.store-fd6e1d50-7308-4146-84fd-5fa62de36212.log
2024-06-30 20:23:07.191 INFO [bg-task-executor-7] --- [KVRangeBalanceController.java:181] Balancer command[ReplicaCntBalancer,ChangeConfigCommand{toStore=fd6e1d50-7308-4146-84fd-5fa62de36212, kvRangeId=112680774442680320_0, expectedVer=2784, voters=[fd6e1d50-7308-4146-84fd-5fa62de36212, 7a67e104-2788-4608-a121-7b80e9dc001e, 542e442a-3748-4ec4-b6db-eda13ad225e6], learner=[]}] result: true
2024-06-30 22:08:53.690 INFO [bg-task-executor-2] --- [KVRangeBalanceController.java:169] Balancer[ReplicaCntBalancer] run command: ChangeConfigCommand{toStore=fd6e1d50-7308-4146-84fd-5fa62de36212, kvRangeId=112680774442680320_0, expectedVer=2788, voters=[fd6e1d50-7308-4146-84fd-5fa62de36212, 7a67e104-2788-4608-a121-7b80e9dc001e], learner=[]}
2024-07-01 09:55:06.882 INFO [bg-task-executor-3] --- [KVRangeBalanceController.java:181] Balancer command[ReplicaCntBalancer,ChangeConfigCommand{toStore=fd6e1d50-7308-4146-84fd-5fa62de36212, kvRangeId=112680774442680320_0, expectedVer=2844, voters=[fd6e1d50-7308-4146-84fd-5fa62de36212, 7a67e104-2788-4608-a121-7b80e9dc001e], learner=[]}] result: true
2024-07-02 17:55:13.775 INFO [bg-task-executor] --- [KVRangeBalanceController.java:181] Balancer command[ReplicaCntBalancer,ChangeConfigCommand{toStore=fd6e1d50-7308-4146-84fd-5fa62de36212, kvRangeId=112680774442680320_0, expectedVer=2856, voters=[fd6e1d50-7308-4146-84fd-5fa62de36212, 7a67e104-2788-4608-a121-7b80e9dc001e, 542e442a-3748-4ec4-b6db-eda13ad225e6], learner=[]}] result: true
54 节点 cpu 截图
54 节点 inbox.store-0a40673e-7e57-47d6-8fa9-e69a2305152e.log
2024-07-03 19:27:35.164 INFO [bg-task-executor] --- [KVRangeBalanceController.java:169] Balancer[ReplicaCntBalancer] run command: ChangeConfigCommand{toStore=0a40673e-7e57-47d6-8fa9-e69a2305152e, kvRangeId=112680774434029568_0, expectedVer=3640, voters=[62837868-8a27-4d5c-9bc3-1a155fc63a66, e8a84d42-8292-489e-a241-9ce716d14e07, 0a40673e-7e57-47d6-8fa9-e69a2305152e], learner=[]}
BifroMQ
- Version: [3.1.1]
- Deployment: [Cluster]
To Reproduce 压测客户端,35w client, 每隔8.5S 发送 body 40k qos =0 的消息,每隔10-12小时休眠 3分钟以上 *** PUB Client ***:
- MQTT Connection:
- ClientIdentifier:
- etc...
- MQTT Pub:
- Topic:
- QoS: [0]
- Retain: [false]
Expected behavior
Logs
Configurations
OS(please complete the following information):
JVM:
- Version: [ 17]
Performance Related
- HOST:
- Cluster node count: [3]
- CPU: [32]
- Memory: [64]
- Network:
- Bandwidth: [5Gbps]
- Latency: []
- Load:
- PUB count: [350000]
- SUB count: [0]
- PUB QPS per connection: [0.12msg/s]
- Payload size: [40KB]
Additional context Add any other context about the problem here.