yugabyte-db
yugabyte-db copied to clipboard
[DocDB] Prometheus Scrape Timeout Exceeds 15 Seconds with 18,000 Tablets
Jira Link: DB-13314
Description
We have observed that the Prometheus metric scraping taking more than 10 seconds when there are 3000 tables and 18000 tablets, which causes the Prometheus target to frequently go down. A sample output includes:
"scrapePool": "yugabyte",
"scrapeUrl": "http://<address>:9000/prometheus-metrics?priority_regex=......",
"lastError": "Get \"http://<address>:9000/prometheus-metrics?priority_regex=......&show_help=false\": context deadline exceeded",
"lastScrapeDuration": 10.00115163,
"health": "down",
"scrapeInterval": "10s",
"scrapeTimeout": "10s"
The lastError shows a "context deadline exceeded" message, indicating a timeout issue when scraping metrics , and "health=down" means the Prometheus target is down.
Issue Type
kind/enhancement
Warning: Please confirm that this issue does not contain any sensitive information
- [X] I confirm this issue does not contain any sensitive information.
Tested on tserver with 4000 tables, and 14000 tablets.
Each scrape took approximately 15 seconds when using the default metric scraping URL parameters:
/prometheus-metrics?show_help=false&priority_regex=rocksdb_(number_db_(next|seek|prev)|db_iter_bytes_read|block_cache_(add|single_touch_add|multi_touch_add)|current_version_(sst_files_size|num_sst_files)|db_([^_]+_micros_[^_]+|mutex_wait_micros)|block_cache_(hit|miss)|bloom_filter_(checked|useful)|stall_micros|flush_write_bytes|compact_[^_]+_bytes|compaction_times_micros_[^_]+|numfiles_in_singlecompaction_[^_]+)|mem_tracker_(RegularDB_MemTable|IntentsDB_MemTable)|mem_tracker_server_PerTablet_(RegularDB_MemTable|IntentsDB_MemTable)|mem_tracker_server_Tablets_overhead_PerTablet_(RegularDB_MemTable|IntentsDB_MemTable)|async_replication_[^_]+_lag_micros|consumer_safe_time_[^_]+|transaction_conflicts|majority_sst_files_rejections|expired_transactions|log_(sync_latency_[^_]+|group_commit_latency_[^_]+|append_latency_[^_]+|bytes_logged|reader_bytes_read|cache_size|cache_num_ops)|follower_lag_ms|[^_]+_memory_pressure_rejections|log_wal_size|ql_read_latency_[^_]+|(all|write)_operations_inflight|ql_write_latency_[^_]+|write_lock_latency_[^_]+|is_raft_leader|ts_live_tablet_peers
During scraping, perf record was executed on the TServer, and generated flamegraph link
The hash map allocator stack from the flamegraph may be contributing to the slow scraping performance.
Block exporting metrics at both the server and table levels(empty output) took approximately 3 seconds to complete using the following URL parameters:/prometheus-metrics?show_help=false&version=v2&table_blocklist=ALL&server_blocklist=ALL
The PrometheusWriter::WriteSingleEntry map allocation stack from the above flamegraph originates from this code:
MetricEntity::AttributeMap new_attr = attr;
new_attr.erase("table_id");
new_attr.erase("table_name");
new_attr.erase("table_type");
new_attr.erase("namespace_name");
This GitHub issue will track the commit that addresses this specific performance issue. Further improvements to metric scraping performance are being tracked in https://github.com/yugabyte/yugabyte-db/issues/24565.
With this optimization, On a 4-core node with 4,000 tables and 18,000 tablets, the scraping time for normal mode reduced from 18 seconds to 13 seconds.