prometheus icon indicating copy to clipboard operation
prometheus copied to clipboard

Feat: Get block by id directly on promtool analyze & get latest block if ID not provided

Open nidhey27 opened this issue 1 year ago • 0 comments

Explanation

The program currently loads all blocks when a user tries to analyze a block using a BlockID, which can be time-consuming as the block size increases. Additionally, if a BlockID is not provided, the program still retrieves all blocks before analyzing the latest one.

The Block(), LastBlockID(), and lastBlockDirName() functions have been implemented. These functions analyze a block by BlockID. If no BlockID is provided, they analyze the latest block.

Related PR

PR 11528 Status - Open

Related Issue

Closes #10822

Proposed Changes

Existing code: openBlock() called by analyzeBlock() uses Blocks() function which returns slice of BlockReaders.

Block() takes up BlockID and analyses that particular block. If BlockID is not provided by user then LastBlockID() will get the BlockID for last block in the tsdb storage with the help of lastBlockDirName() Unit test case for this feat is implemented.

Proof Manifests

CLI Output

With BlockID

$ ./promtool tsdb analyze data/ 01GT3F36F6YVGS72BEH580W0S5
Block ID: 01GT3F36F6YVGS72BEH580W0S5
Duration: 17h54m59.934s
Series: 432
Label names: 25
Postings (unique label pairs): 343
Postings entries (total label pairs): 1696

Label pairs most involved in churning:
0 le=0.1
0 call=services
0 __name__=prometheus_sd_file_scan_duration_seconds_sum
0 __name__=prometheus_sd_kubernetes_events_total
0 __name__=go_memstats_frees_total
0 reason=refused
0 __name__=net_conntrack_listener_conn_accepted_total
0 __name__=prometheus_http_request_duration_seconds_bucket
0 code=503
0 revision=66da1d51fd92977df9a9bf4f8c69303b2ba8d88e-modified
0 __name__=prometheus_rule_group_duration_seconds_sum
0 __name__=prometheus_sd_updates_total
0 le=25600
0 __name__=prometheus_target_metadata_cache_entries
0 __name__=prometheus_tsdb_wal_fsync_duration_seconds_sum
0 __name__=process_virtual_memory_max_bytes
0 le=0.2
0 handler=/graph
0 __name__=prometheus_target_interval_length_seconds_count
0 le=10000

Label names most involved in churning:
0 instance
0 slice
0 endpoint
0 name
0 goversion
0 le
0 event
0 interval
0 job
0 quantile
0 version
0 reason
0 listener_name
0 goarch
0 goos
0 handler
0 config
0 scrape_job
0 __name__
0 dialer_name


...

Highest cardinality metric names:
30 prometheus_http_request_duration_seconds_bucket
27 prometheus_http_response_size_bytes_bucket
18 prometheus_sd_kubernetes_events_total
15 prometheus_tsdb_compaction_duration_seconds_bucket
13 prometheus_tsdb_compaction_chunk_samples_bucket
13 prometheus_tsdb_compaction_chunk_size_bytes_bucket
12 prometheus_engine_query_duration_seconds
12 net_conntrack_dialer_conn_failed_total
12 prometheus_tsdb_tombstone_cleanup_seconds_bucket
11 prometheus_tsdb_compaction_chunk_range_seconds_bucket
6 prometheus_sd_consul_rpc_duration_seconds
5 go_gc_duration_seconds
5 prometheus_target_interval_length_seconds
5 prometheus_rule_group_duration_seconds
5 prometheus_target_sync_length_seconds
4 prometheus_engine_query_duration_seconds_count
4 prometheus_engine_query_duration_seconds_sum
3 prometheus_rule_evaluation_duration_seconds
3 net_conntrack_dialer_conn_closed_total
3 promhttp_metric_handler_requests_total

Without BlockID

$ ./promtool tsdb analyze
Block ID: 01GT3F36F6YVGS72BEH580W0S5
Duration: 17h54m59.934s
Series: 432
Label names: 25
Postings (unique label pairs): 343
Postings entries (total label pairs): 1696

Label pairs most involved in churning:
0 __name__=prometheus_http_response_size_bytes_bucket
0 __name__=prometheus_sd_consul_rpc_duration_seconds_sum
0 __name__=prometheus_sd_kuma_fetch_duration_seconds_sum
0 __name__=prometheus_tsdb_tombstone_cleanup_seconds_bucket
0 le=512
0 __name__=prometheus_tsdb_head_gc_duration_seconds_count
0 __name__=prometheus_tsdb_head_min_time
0 __name__=go_gc_duration_seconds_sum
0 __name__=go_threads
0 __name__=prometheus_engine_queries_concurrent_max
0 __name__=prometheus_sd_file_scan_duration_seconds_count
0 le=72
0 type=histogram
0 slice=queue_time
0 __name__=prometheus_target_scrapes_sample_out_of_order_total
0 __name__=prometheus_template_text_expansions_total
0 __name__=prometheus_tsdb_head_chunks_created_total
0 __name__=prometheus_tsdb_isolation_low_watermark
0 __name__=prometheus_target_scrape_pool_exceeded_target_limit_total
0 __name__=prometheus_tsdb_compaction_chunk_samples_count

.
.....

Highest cardinality metric names:
30 prometheus_http_request_duration_seconds_bucket
27 prometheus_http_response_size_bytes_bucket
18 prometheus_sd_kubernetes_events_total
15 prometheus_tsdb_compaction_duration_seconds_bucket
13 prometheus_tsdb_compaction_chunk_samples_bucket
13 prometheus_tsdb_compaction_chunk_size_bytes_bucket
12 prometheus_engine_query_duration_seconds
12 net_conntrack_dialer_conn_failed_total
12 prometheus_tsdb_tombstone_cleanup_seconds_bucket
11 prometheus_tsdb_compaction_chunk_range_seconds_bucket
6 prometheus_sd_consul_rpc_duration_seconds
5 go_gc_duration_seconds
5 prometheus_target_interval_length_seconds
5 prometheus_rule_group_duration_seconds
5 prometheus_target_sync_length_seconds
4 prometheus_engine_query_duration_seconds_count
4 prometheus_engine_query_duration_seconds_sum
3 prometheus_rule_evaluation_duration_seconds
3 net_conntrack_dialer_conn_closed_total
3 promhttp_metric_handler_requests_total

Test Case output When BlockID is passed

level=info msg="Found healthy block" mint=10 maxt=12 ulid=01GTBEGMJYRX8PYQ34XH982XEV
level=info msg="Found healthy block" mint=12 maxt=14 ulid=01GTBEGMKJX1YE4D3XZ393Y8WW
level=info msg="Found healthy block" mint=14 maxt=16 ulid=01GTBEGMM14QS037Y0MRZF8SDJ
level=info msg="Replaying on-disk memory mappable chunks if any"
level=info msg="On-disk memory mappable chunks replay completed" duration=2.352µs
level=info msg="Replaying WAL, this may take a while"
level=info msg="WAL segment loaded" segment=0 maxSegment=1
level=info msg="WAL segment loaded" segment=1 maxSegment=1
level=info msg="WAL replay completed" checkpoint_replay_duration=45.93µs wal_replay_duration=970µs wbl_replay_duration=69ns total_replay_duration=1.029182ms
level=info msg="Compactions disabled"
{01GTBEGMJYRX8PYQ34XH982XEV 10 12 {2 1 1 0} {1 [01GTBEGMJYRX8PYQ34XH982XEV] false [] false []} 1}
{01GTBEGMJYRX8PYQ34XH982XEV 10 12 {2 1 1 0} {1 [01GTBEGMJYRX8PYQ34XH982XEV] false [] false []} 1}

When BlockID is not passed

level=info msg="Found healthy block" mint=10 maxt=12 ulid=01GTBDCGCGAJ1DXHC25K1NRXFV
level=info msg="Found healthy block" mint=12 maxt=14 ulid=01GTBDCGD4BH4BNQWQBD35Z92M
level=info msg="Found healthy block" mint=14 maxt=16 ulid=01GTBDCGDJ36SDFQAJGAX798MK
level=info msg="Replaying on-disk memory mappable chunks if any"
level=info msg="On-disk memory mappable chunks replay completed" duration=2.856µs
level=info msg="Replaying WAL, this may take a while"
level=info msg="WAL segment loaded" segment=0 maxSegment=1
level=info msg="WAL segment loaded" segment=1 maxSegment=1
level=info msg="WAL replay completed" checkpoint_replay_duration=46.932µs wal_replay_duration=1.251607ms wbl_replay_duration=89ns total_replay_duration=1.315984ms
level=info msg="Compactions disabled"
{01GTBDCGCGAJ1DXHC25K1NRXFV 10 12 {2 1 1 0} {1 [01GTBDCGCGAJ1DXHC25K1NRXFV] false [] false []} 1}
{01GTBDCGCGAJ1DXHC25K1NRXFV 10 12 {2 1 1 0} {1 [01GTBDCGCGAJ1DXHC25K1NRXFV] false [] false []} 1}

nidhey27 avatar Mar 01 '23 05:03 nidhey27