a lot of 'linstor_apicall_duration_seconds_bucket' in metrics

Open Roman2dot0 opened this issue 2 months ago • 1 comments

Hello. What does the metric 'linstor_apicall_duration_seconds_bucket' mean and how I can disable it?

Few days ago i got an error about scraping linstor-controller /metrics cause output reach scrape limit. Was > 150 Mb, cleaned up after linstor-controller restart and started slow grow up.

Current state of metrics count:

curl --silent 10.227.1.228:3370/metrics | awk -F '{' '/linstor_/ && !/HELP lin/ && !/TYPE lin/ {print $1}' | sort | uniq -c 9680 linstor_apicall_duration_seconds_bucket 484 linstor_apicall_duration_seconds_count 484 linstor_apicall_duration_seconds_created 484 linstor_apicall_duration_seconds_sum 40 linstor_error_reports_count 1 linstor_error_reports_count 0.0 1 linstor_info 40 linstor_node_reconnect_attempt_count 40 linstor_node_state 1 linstor_resource_definition_count 24.0 24 linstor_resource_definition_resource_count 77 linstor_resource_state 280 linstor_restapi_request_duration_seconds_bucket 14 linstor_restapi_request_duration_seconds_count 14 linstor_restapi_request_duration_seconds_created 14 linstor_restapi_request_duration_seconds_sum 1 linstor_scrape_duration_seconds 0.031 1 linstor_scrape_requests_count 1222803.0 43 linstor_storage_pool_capacity_free_bytes 43 linstor_storage_pool_capacity_total_bytes 43 linstor_storage_pool_error_count 77 linstor_volume_allocated_size_bytes 24 linstor_volume_definition_size_bytes 77 linstor_volume_state

Part of 'linstor_apicall_duration_seconds_bucket':

linstor_apicall_duration_seconds_bucket{apicall="ReqErrorReports",peer="10.227.1.53:3366/388",le="0.001",} 1.0 linstor_apicall_duration_seconds_bucket{apicall="ReqErrorReports",peer="10.227.1.53:3366/388",le="0.0025",} 1.0 linstor_apicall_duration_seconds_bucket{apicall="ReqErrorReports",peer="10.227.1.53:3366/388",le="0.005",} 1.0 linstor_apicall_duration_seconds_bucket{apicall="ReqErrorReports",peer="10.227.1.53:3366/388",le="0.0075",} 1.0 linstor_apicall_duration_seconds_bucket{apicall="ReqErrorReports",peer="10.227.1.53:3366/388",le="0.01",} 1.0 linstor_apicall_duration_seconds_bucket{apicall="ReqErrorReports",peer="10.227.1.53:3366/388",le="0.025",} 1.0 linstor_apicall_duration_seconds_bucket{apicall="ReqErrorReports",peer="10.227.1.53:3366/388",le="0.05",} 1.0 linstor_apicall_duration_seconds_bucket{apicall="ReqErrorReports",peer="10.227.1.53:3366/388",le="0.075",} 1.0 linstor_apicall_duration_seconds_bucket{apicall="ReqErrorReports",peer="10.227.1.53:3366/388",le="0.1",} 1.0 linstor_apicall_duration_seconds_bucket{apicall="ReqErrorReports",peer="10.227.1.53:3366/388",le="0.25",} 1.0 linstor_apicall_duration_seconds_bucket{apicall="ReqErrorReports",peer="10.227.1.53:3366/388",le="0.5",} 1.0 linstor_apicall_duration_seconds_bucket{apicall="ReqErrorReports",peer="10.227.1.53:3366/388",le="0.75",} 1.0 linstor_apicall_duration_seconds_bucket{apicall="ReqErrorReports",peer="10.227.1.53:3366/388",le="1.0",} 1.0 linstor_apicall_duration_seconds_bucket{apicall="ReqErrorReports",peer="10.227.1.53:3366/388",le="2.5",} 1.0 linstor_apicall_duration_seconds_bucket{apicall="ReqErrorReports",peer="10.227.1.53:3366/388",le="5.0",} 1.0 linstor_apicall_duration_seconds_bucket{apicall="ReqErrorReports",peer="10.227.1.53:3366/388",le="7.5",} 1.0 linstor_apicall_duration_seconds_bucket{apicall="ReqErrorReports",peer="10.227.1.53:3366/388",le="10.0",} 1.0 linstor_apicall_duration_seconds_bucket{apicall="ReqErrorReports",peer="10.227.1.53:3366/388",le="25.0",} 1.0 linstor_apicall_duration_seconds_bucket{apicall="ReqErrorReports",peer="10.227.1.53:3366/388",le="50.0",} 1.0 linstor_apicall_duration_seconds_bucket{apicall="ReqErrorReports",peer="10.227.1.53:3366/388",le="+Inf",} 1.0

I have had simlar problem before, but with 'linstor_restapi_request_duration_bucker_*' and sovled it wtih

[logging] rest_access_log_mode = "NO_LOG"

Linstor version 1.25.4 (1.32.1 also has this problem)

Oct 28 '25 06:10 Roman2dot0

This comes from a change in our Peer.toString() method, where we include now the connection count. but the statistics shouldn't use that for tracking. We'll fix that in the next release, but you should also check your connection stability if it is growing so much for you, except if you have a very dynamic environment with nodes spawning and downing all the time.

Oct 28 '25 07:10 rp-