thanos icon indicating copy to clipboard operation
thanos copied to clipboard

Update: Add a source label to the thanos_rule_evaluation_with_warnings_total metric

Open amandaguan-ag opened this issue 8 months ago • 0 comments

  • [X] I added CHANGELOG entry for this change.
  • [ ] Change is not relevant to the end user.

Changes

This PR addresses issue #7159 by adding more granular information to the thanos_rule_evaluation_with_warnings_total metric. Specifically:

  • Added 'file' and 'group' labels to the metric definition.
  • Updated queryFuncCreator to extract rule group and file information from context.
  • Modified metric incrementation to include file and group information.
  • Added file and group information to warning logs for better traceability.

These changes allow for easier identification of the source of rule evaluation warnings, enabling more efficient debugging and alerting.

Verification

To verify the changes, I performed the following steps:

  1. Created prom.yaml with the specified content:

    global:
      external_labels:
        cluster: cluster_1
    
  2. Started Prometheus:

    prometheus --config.file prom.yaml
    
  3. Started Thanos sidecar:

    thanos sidecar --http-address 0.0.0.0:10905 --grpc-address 0.0.0.0:10906
    
  4. Started Thanos querier:

    thanos query --endpoint 0.0.0.0:10906
    
  5. Created rules.yaml with the example rule content.

  6. Started Thanos ruler:

    thanos rule --query localhost:10902 --grpc-address 0.0.0.0:10903 --http-address 0.0.0.0:10904 --rule-file rules.yaml --eval-interval 5s
    
  7. Killed the sidecar process to trigger partial responses.

  8. Checked for partial responses using:

    curl -sq localhost:10904/metrics | grep thanos_rule_eval
    
  9. Verified that the output included the new 'file' and 'group' labels:

    thanos_rule_evaluation_with_warnings_total{file="data/.tmp-rules/ABORT/rules.yaml",group="example",strategy="abort"} 1
    

This test confirms that the metric now includes the additional granular information (file and group) as intended, allowing for easier identification of the source of rule evaluation warnings.

amandaguan-ag avatar Jun 28 '24 17:06 amandaguan-ag