hedera-services icon indicating copy to clipboard operation
hedera-services copied to clipboard

Health monitor is not efficient

Open OlegMazurov opened this issue 7 months ago • 4 comments

Problem

Below are metrics collected from a single node processing NftTransferLoadTest at full speed. Ideally, transaction handling, being the actual bottleneck, should be 100% busy all the time and the unhandled task queue should fluctuate around a steady number. That's not the case in this example as the health monitor discovers an unhealthy state with a time lag and its reaction is too harsh emptying the incoming queue for a prolonged period of time and allowing the handling thread to go idle.

 time                   trans_per_sec unhealthyDuration TransactionHandler_unhandled_task_count TransactionHandler_busy_fraction
--------------------------------------------------------------------------------------------------------------------------------
2024-07-25 15:25:06 UTC 6441.49       0.200             3                                       0.743
2024-07-25 15:25:09 UTC 6309.36       0.000             1                                       0.720
2024-07-25 15:25:12 UTC 6711.45       1.100             63                                      0.951
2024-07-25 15:25:15 UTC 5758.57       0.200             38                                      1.000
2024-07-25 15:25:18 UTC 5132.63       3.200             47                                      1.000
2024-07-25 15:25:21 UTC 4167.24       6.200             37                                      1.000
2024-07-25 15:25:24 UTC 3537.85       2.300             31                                      1.000
2024-07-25 15:25:27 UTC 3083.83       0.000             19                                      0.928
2024-07-25 15:25:30 UTC 3916.83       0.000             1                                       0.916
2024-07-25 15:25:33 UTC 4591.21       0.000             14                                      0.895
2024-07-25 15:25:36 UTC 4770.38       0.000             5                                       0.884
2024-07-25 15:25:39 UTC 4898.66       0.000             11                                      0.912
2024-07-25 15:25:42 UTC 5509.99       0.500             36                                      1.000
2024-07-25 15:25:45 UTC 5189.37       0.000             2                                       0.917
2024-07-25 15:25:48 UTC 5616.71       0.000             9                                       0.886
2024-07-25 15:25:51 UTC 6059.27       0.000             0                                       0.998
2024-07-25 15:25:54 UTC 6000.89       0.000             0                                       0.624
2024-07-25 15:25:57 UTC 5880.47       0.000             0                                       0.445
2024-07-25 15:26:00 UTC 6030.48       0.000             8                                       0.843
2024-07-25 15:26:03 UTC 6029.67       0.000             1                                       0.956
2024-07-25 15:26:06 UTC 6273.28       0.000             2                                       0.953
2024-07-25 15:26:09 UTC 6356.39       0.000             1                                       0.852
2024-07-25 15:26:12 UTC 6229.92       0.300             43                                      0.888
2024-07-25 15:26:15 UTC 5957.35       0.000             18                                      1.000
2024-07-25 15:26:18 UTC 5992.18       0.000             1                                       0.917
2024-07-25 15:26:21 UTC 6115.42       0.000             7                                       0.732
2024-07-25 15:26:24 UTC 6087.66       0.000             1                                       0.828
2024-07-25 15:26:27 UTC 6297.59       0.000             1                                       0.861
2024-07-25 15:26:30 UTC 6276.00       0.000             1                                       0.578
2024-07-25 15:26:33 UTC 6208.90       0.000             1                                       0.679
2024-07-25 15:26:36 UTC 6344.00       0.000             17                                      0.755
2024-07-25 15:26:39 UTC 5551.75       2.400             40                                      1.000
2024-07-25 15:26:42 UTC 4988.60       1.900             50                                      1.000
2024-07-25 15:26:45 UTC 4575.24       0.000             30                                      1.000
2024-07-25 15:26:48 UTC 4900.30       0.000             28                                      1.000
2024-07-25 15:26:51 UTC 5115.38       0.000             19                                      1.000
2024-07-25 15:26:54 UTC 5181.09       0.900             60                                      1.000
2024-07-25 15:26:57 UTC 4645.69       0.000             8                                       1.000
2024-07-25 15:27:00 UTC 4971.97       0.000             2                                       0.606
2024-07-25 15:27:03 UTC 5178.77       0.000             1                                       0.562
2024-07-25 15:27:06 UTC 5336.48       1.000             62                                      0.853
2024-07-25 15:27:09 UTC 4618.90       0.000             20                                      1.000
2024-07-25 15:27:12 UTC 4871.70       0.000             17                                      0.915
2024-07-25 15:27:15 UTC 4871.62       0.000             20                                      1.000
2024-07-25 15:27:18 UTC 4990.09       0.000             1                                       0.761
2024-07-25 15:27:21 UTC 5079.62       0.000             1                                       0.574
2024-07-25 15:27:24 UTC 5145.22       0.000             0                                       0.657
2024-07-25 15:27:27 UTC 5203.69       0.000             1                                       0.549
2024-07-25 15:27:30 UTC 5249.63       0.000             0                                       0.528
2024-07-25 15:27:33 UTC 5160.38       0.000             1                                       0.454
2024-07-25 15:27:36 UTC 5267.93       0.000             0                                       0.894
2024-07-25 15:27:39 UTC 5360.08       0.000             0                                       0.618

Solution

Tuning the health monitor may help to increase throughput in the short run. A longer term solution requires a better mechanism to limit buffering unhandled tasks for transaction handling and the entire pipeline.

Alternatives

No response

OlegMazurov avatar Jul 25 '24 23:07 OlegMazurov