bifromq icon indicating copy to clipboard operation
bifromq copied to clipboard

长时间压测后,线程wal-raft-executor-112680774442680320_0 和 basekv-range-mutator CPU高,一直降不下来

Open masterOcean opened this issue 7 months ago • 3 comments

长时间压测后,线程wal-raft-executor-112680774442680320_0 CPU高,一直降不下来 集群3个节点(32C,64G)(20,54,124 三台),35w客户端,每隔10s发 40K body 压测,每隔10-12小时休眠 3分钟左右。大概2天后,20节点 wal-raft-executor-112680774442680320_0 线程 CPU 占用高,54 节点上 wal-raft-executor-112680774434029568_0 线程 CPU 占用高,而且一直降不下来,同时 basekv-range-mutator 线程 CPU也很高而且无法将来下。 但这期间集群正常,warn.log 和 error.log 都没有错误打印, gc 日志正常。balancer 日志中能搜到该线程 20 节点 cpu 截图 image

20 节点 retain.store-fd6e1d50-7308-4146-84fd-5fa62de36212.log

2024-06-30 20:23:07.191  INFO [bg-task-executor-7] --- [KVRangeBalanceController.java:181] Balancer command[ReplicaCntBalancer,ChangeConfigCommand{toStore=fd6e1d50-7308-4146-84fd-5fa62de36212, kvRangeId=112680774442680320_0, expectedVer=2784, voters=[fd6e1d50-7308-4146-84fd-5fa62de36212, 7a67e104-2788-4608-a121-7b80e9dc001e, 542e442a-3748-4ec4-b6db-eda13ad225e6], learner=[]}] result: true
2024-06-30 22:08:53.690  INFO [bg-task-executor-2] --- [KVRangeBalanceController.java:169] Balancer[ReplicaCntBalancer] run command: ChangeConfigCommand{toStore=fd6e1d50-7308-4146-84fd-5fa62de36212, kvRangeId=112680774442680320_0, expectedVer=2788, voters=[fd6e1d50-7308-4146-84fd-5fa62de36212, 7a67e104-2788-4608-a121-7b80e9dc001e], learner=[]}
2024-07-01 09:55:06.882  INFO [bg-task-executor-3] --- [KVRangeBalanceController.java:181] Balancer command[ReplicaCntBalancer,ChangeConfigCommand{toStore=fd6e1d50-7308-4146-84fd-5fa62de36212, kvRangeId=112680774442680320_0, expectedVer=2844, voters=[fd6e1d50-7308-4146-84fd-5fa62de36212, 7a67e104-2788-4608-a121-7b80e9dc001e], learner=[]}] result: true
2024-07-02 17:55:13.775  INFO [bg-task-executor] --- [KVRangeBalanceController.java:181] Balancer command[ReplicaCntBalancer,ChangeConfigCommand{toStore=fd6e1d50-7308-4146-84fd-5fa62de36212, kvRangeId=112680774442680320_0, expectedVer=2856, voters=[fd6e1d50-7308-4146-84fd-5fa62de36212, 7a67e104-2788-4608-a121-7b80e9dc001e, 542e442a-3748-4ec4-b6db-eda13ad225e6], learner=[]}] result: true

54 节点 cpu 截图 image

54 节点 inbox.store-0a40673e-7e57-47d6-8fa9-e69a2305152e.log

2024-07-03 19:27:35.164  INFO [bg-task-executor] --- [KVRangeBalanceController.java:169] Balancer[ReplicaCntBalancer] run command: ChangeConfigCommand{toStore=0a40673e-7e57-47d6-8fa9-e69a2305152e, kvRangeId=112680774434029568_0, expectedVer=3640, voters=[62837868-8a27-4d5c-9bc3-1a155fc63a66, e8a84d42-8292-489e-a241-9ce716d14e07, 0a40673e-7e57-47d6-8fa9-e69a2305152e], learner=[]}

BifroMQ

  • Version: [3.1.1]
  • Deployment: [Cluster]

To Reproduce 压测客户端,35w client, 每隔8.5S 发送 body 40k qos =0 的消息,每隔10-12小时休眠 3分钟以上 *** PUB Client ***:

  • MQTT Connection:
    • ClientIdentifier:
    • etc...
  • MQTT Pub:
    • Topic:
    • QoS: [0]
    • Retain: [false]

Expected behavior

Logs

Configurations

OS(please complete the following information):

JVM:

  • Version: [ 17]

Performance Related

  • HOST:
    • Cluster node count: [3]
    • CPU: [32]
    • Memory: [64]
  • Network:
    • Bandwidth: [5Gbps]
    • Latency: []
  • Load:
    • PUB count: [350000]
    • SUB count: [0]
    • PUB QPS per connection: [0.12msg/s]
    • Payload size: [40KB]

Additional context Add any other context about the problem here.

masterOcean avatar Jul 08 '24 01:07 masterOcean