yugabyte-db icon indicating copy to clipboard operation
yugabyte-db copied to clipboard

[DocDB] Prometheus Scrape Timeout Exceeds 15 Seconds with 18,000 Tablets

Open yusong-yan opened this issue 1 year ago • 2 comments

Jira Link: DB-13314

Description

We have observed that the Prometheus metric scraping taking more than 10 seconds when there are 3000 tables and 18000 tablets, which causes the Prometheus target to frequently go down. A sample output includes:

    "scrapePool": "yugabyte",
    "scrapeUrl": "http://<address>:9000/prometheus-metrics?priority_regex=......",
    "lastError": "Get \"http://<address>:9000/prometheus-metrics?priority_regex=......&show_help=false\": context deadline exceeded",
    "lastScrapeDuration": 10.00115163,
    "health": "down",
    "scrapeInterval": "10s",
    "scrapeTimeout": "10s"

The lastError shows a "context deadline exceeded" message, indicating a timeout issue when scraping metrics , and "health=down" means the Prometheus target is down.

Issue Type

kind/enhancement

Warning: Please confirm that this issue does not contain any sensitive information

  • [X] I confirm this issue does not contain any sensitive information.

yusong-yan avatar Oct 11 '24 21:10 yusong-yan

Tested on tserver with 4000 tables, and 14000 tablets.

Each scrape took approximately 15 seconds when using the default metric scraping URL parameters: /prometheus-metrics?show_help=false&priority_regex=rocksdb_(number_db_(next|seek|prev)|db_iter_bytes_read|block_cache_(add|single_touch_add|multi_touch_add)|current_version_(sst_files_size|num_sst_files)|db_([^_]+_micros_[^_]+|mutex_wait_micros)|block_cache_(hit|miss)|bloom_filter_(checked|useful)|stall_micros|flush_write_bytes|compact_[^_]+_bytes|compaction_times_micros_[^_]+|numfiles_in_singlecompaction_[^_]+)|mem_tracker_(RegularDB_MemTable|IntentsDB_MemTable)|mem_tracker_server_PerTablet_(RegularDB_MemTable|IntentsDB_MemTable)|mem_tracker_server_Tablets_overhead_PerTablet_(RegularDB_MemTable|IntentsDB_MemTable)|async_replication_[^_]+_lag_micros|consumer_safe_time_[^_]+|transaction_conflicts|majority_sst_files_rejections|expired_transactions|log_(sync_latency_[^_]+|group_commit_latency_[^_]+|append_latency_[^_]+|bytes_logged|reader_bytes_read|cache_size|cache_num_ops)|follower_lag_ms|[^_]+_memory_pressure_rejections|log_wal_size|ql_read_latency_[^_]+|(all|write)_operations_inflight|ql_write_latency_[^_]+|write_lock_latency_[^_]+|is_raft_leader|ts_live_tablet_peers During scraping, perf record was executed on the TServer, and generated flamegraph link The hash map allocator stack from the flamegraph may be contributing to the slow scraping performance.

Screenshot 2024-10-11 at 6 19 45 PM

yusong-yan avatar Oct 11 '24 22:10 yusong-yan

Block exporting metrics at both the server and table levels(empty output) took approximately 3 seconds to complete using the following URL parameters:/prometheus-metrics?show_help=false&version=v2&table_blocklist=ALL&server_blocklist=ALL

yusong-yan avatar Oct 11 '24 22:10 yusong-yan

The PrometheusWriter::WriteSingleEntry map allocation stack from the above flamegraph originates from this code:

    MetricEntity::AttributeMap new_attr = attr;
    new_attr.erase("table_id");
    new_attr.erase("table_name");
    new_attr.erase("table_type");
    new_attr.erase("namespace_name");

This GitHub issue will track the commit that addresses this specific performance issue. Further improvements to metric scraping performance are being tracked in https://github.com/yugabyte/yugabyte-db/issues/24565.

yusong-yan avatar Oct 22 '24 17:10 yusong-yan

With this optimization, On a 4-core node with 4,000 tables and 18,000 tablets, the scraping time for normal mode reduced from 18 seconds to 13 seconds.

rthallamko3 avatar Feb 19 '25 00:02 rthallamko3