opentelemetry-collector-contrib icon indicating copy to clipboard operation
opentelemetry-collector-contrib copied to clipboard

[receiver/httpcheck] emit single `httpcheck.status` datapoint instead of five

Open andrzej-stencel opened this issue 2 years ago • 12 comments
trafficstars

Component(s)

receiver/httpcheck

Version

v0.78.0

Is your feature request related to a problem? Please describe.

The HTTP Check receiver currently emits 6 time series per single endpoint. For example, the following configuration:

exporters:
  logging:
    verbosity: detailed

receivers:
  httpcheck:
    endpoint: https://opentelemetry.io

service:
  pipelines:
    metrics:
      exporters:
      - logging
      receivers:
      - httpcheck

gives the following output:

$ otelcol-contrib-0.78.0 --config config.yaml
2023-06-01T12:48:14.930+0200    info    service/telemetry.go:104        Setting up own telemetry...
2023-06-01T12:48:14.930+0200    info    service/telemetry.go:127        Serving Prometheus metrics      {"address": ":8888", "level": "Basic"}
2023-06-01T12:48:14.930+0200    info    [email protected]/exporter.go:275        Development component. May change in the future.        {"kind": "exporter", "data_type": "metrics", "name": "logging"}
2023-06-01T12:48:14.930+0200    info    [email protected]/receiver.go:296        Development component. May change in the future.        {"kind": "receiver", "name": "httpcheck", "data_type": "metrics"}
2023-06-01T12:48:14.954+0200    info    service/service.go:131  Starting otelcol-contrib...     {"Version": "0.78.0", "NumCPU": 16}
2023-06-01T12:48:14.954+0200    info    extensions/extensions.go:30     Starting extensions...
2023-06-01T12:48:14.956+0200    info    service/service.go:148  Everything is ready. Begin running and processing data.2023-06-01T12:48:18.905+0200    info    MetricsExporter {"kind": "exporter", "data_type": "metrics", "name": "logging", "resource metrics": 1, "metrics": 2, "data points": 6}
2023-06-01T12:48:18.905+0200    info    ResourceMetrics #0
Resource SchemaURL: 
ScopeMetrics #0
ScopeMetrics SchemaURL: 
InstrumentationScope otelcol/httpcheckreceiver 0.78.0
Metric #0
Descriptor:
     -> Name: httpcheck.duration
     -> Description: Measures the duration of the HTTP check.
     -> Unit: ms
     -> DataType: Gauge
NumberDataPoints #0
Data point attributes:
     -> http.url: Str(https://opentelemetry.io)
StartTimestamp: 2023-06-01 10:48:14.930686644 +0000 UTC
Timestamp: 2023-06-01 10:48:17.963384163 +0000 UTC
Value: 941
Metric #1
Descriptor:
     -> Name: httpcheck.status
     -> Description: 1 if the check resulted in status_code matching the status_class, otherwise 0.
     -> Unit: 1
     -> DataType: Sum
     -> IsMonotonic: false
     -> AggregationTemporality: Cumulative
NumberDataPoints #0
Data point attributes:
     -> http.url: Str(https://opentelemetry.io)
     -> http.status_code: Int(200)
     -> http.method: Str(GET)
     -> http.status_class: Str(1xx)
StartTimestamp: 2023-06-01 10:48:14.930686644 +0000 UTC
Timestamp: 2023-06-01 10:48:17.963384163 +0000 UTC
Value: 0
NumberDataPoints #1
Data point attributes:
     -> http.url: Str(https://opentelemetry.io)
     -> http.status_code: Int(200)
     -> http.method: Str(GET)
     -> http.status_class: Str(2xx)
StartTimestamp: 2023-06-01 10:48:14.930686644 +0000 UTC
Timestamp: 2023-06-01 10:48:17.963384163 +0000 UTC
Value: 1
NumberDataPoints #2
Data point attributes:
     -> http.url: Str(https://opentelemetry.io)
     -> http.status_code: Int(200)
     -> http.method: Str(GET)
     -> http.status_class: Str(3xx)
StartTimestamp: 2023-06-01 10:48:14.930686644 +0000 UTC
Timestamp: 2023-06-01 10:48:17.963384163 +0000 UTC
Value: 0
NumberDataPoints #3
Data point attributes:
     -> http.url: Str(https://opentelemetry.io)
     -> http.status_code: Int(200)
     -> http.method: Str(GET)
     -> http.status_class: Str(4xx)
StartTimestamp: 2023-06-01 10:48:14.930686644 +0000 UTC
Timestamp: 2023-06-01 10:48:17.963384163 +0000 UTC
Value: 0
NumberDataPoints #4
Data point attributes:
     -> http.url: Str(https://opentelemetry.io)
     -> http.status_code: Int(200)
     -> http.method: Str(GET)
     -> http.status_class: Str(5xx)
StartTimestamp: 2023-06-01 10:48:14.930686644 +0000 UTC
Timestamp: 2023-06-01 10:48:17.963384163 +0000 UTC
Value: 0
        {"kind": "exporter", "data_type": "metrics", "name": "logging"}

The data points with 0 value do not carry a lot of information. Ideally, I would expect to only have one httpcheck.status data point emitted for an endpoint.

Describe the solution you'd like

I propose to make it possible via a configuration option to only emit non-zero data points:

receivers:
  httpcheck:
    endpoint: https://opentelemetry.io
    emit_zero_values: false # we might need a better name for this configuration property

so that the output would be something like:

$ otelcol-contrib-0.78.0 --config config.yaml
2023-06-01T12:48:14.930+0200    info    service/telemetry.go:104        Setting up own telemetry...
2023-06-01T12:48:14.930+0200    info    service/telemetry.go:127        Serving Prometheus metrics      {"address": ":8888", "level": "Basic"}
2023-06-01T12:48:14.930+0200    info    [email protected]/exporter.go:275        Development component. May change in the future.        {"kind": "exporter", "data_type": "metrics", "name": "logging"}
2023-06-01T12:48:14.930+0200    info    [email protected]/receiver.go:296        Development component. May change in the future.        {"kind": "receiver", "name": "httpcheck", "data_type": "metrics"}
2023-06-01T12:48:14.954+0200    info    service/service.go:131  Starting otelcol-contrib...     {"Version": "0.78.0", "NumCPU": 16}
2023-06-01T12:48:14.954+0200    info    extensions/extensions.go:30     Starting extensions...
2023-06-01T12:48:14.956+0200    info    service/service.go:148  Everything is ready. Begin running and processing data.2023-06-01T12:48:18.905+0200    info    MetricsExporter {"kind": "exporter", "data_type": "metrics", "name": "logging", "resource metrics": 1, "metrics": 2, "data points": 2}
2023-06-01T12:48:18.905+0200    info    ResourceMetrics #0
Resource SchemaURL: 
ScopeMetrics #0
ScopeMetrics SchemaURL: 
InstrumentationScope otelcol/httpcheckreceiver 0.78.0
Metric #0
Descriptor:
     -> Name: httpcheck.duration
     -> Description: Measures the duration of the HTTP check.
     -> Unit: ms
     -> DataType: Gauge
NumberDataPoints #0
Data point attributes:
     -> http.url: Str(https://opentelemetry.io)
StartTimestamp: 2023-06-01 10:48:14.930686644 +0000 UTC
Timestamp: 2023-06-01 10:48:17.963384163 +0000 UTC
Value: 941
Metric #1
Descriptor:
     -> Name: httpcheck.status
     -> Description: 1 if the check resulted in status_code matching the status_class, otherwise 0.
     -> Unit: 1
     -> DataType: Sum
     -> IsMonotonic: false
     -> AggregationTemporality: Cumulative
NumberDataPoints #0
Data point attributes:
     -> http.url: Str(https://opentelemetry.io)
     -> http.status_code: Int(200)
     -> http.method: Str(GET)
     -> http.status_class: Str(2xx)
StartTimestamp: 2023-06-01 10:48:14.930686644 +0000 UTC
Timestamp: 2023-06-01 10:48:17.963384163 +0000 UTC
Value: 1
        {"kind": "exporter", "data_type": "metrics", "name": "logging"}

I also think this might be a good default for this receiver.

Describe alternatives you've considered

I suppose an alternative would be to add a processor to the pipeline that will filter out the zero data points. But honestly I wasn't able to find a way to filter out metric data points based on their value using either the Filter processor or Metrics Transform processor. Is this possible? :thinking:

Additional context

Telemetry is costly. We don't want to collect metrics that don't carry a lot of value.

andrzej-stencel avatar Jun 01 '23 11:06 andrzej-stencel

Pinging code owners:

  • receiver/httpcheck: @codeboten

See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions[bot] avatar Jun 01 '23 11:06 github-actions[bot]

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

  • needs: Github issue template generation code needs this to generate the corresponding labels.
  • receiver/httpcheck: @codeboten

See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions[bot] avatar Aug 01 '23 03:08 github-actions[bot]

@codeboten can you please take a look? I believe this issue is important, as I couldn't find a way to exclude the zero time series using a processor:

I suppose an alternative would be to add a processor to the pipeline that will filter out the zero data points. But honestly I wasn't able to find a way to filter out metric data points based on their value using either the Filter processor or Metrics Transform processor. Is this possible? 🤔

andrzej-stencel avatar Sep 11 '23 10:09 andrzej-stencel

@astencel-sumo will take a look this week

codeboten avatar Oct 04 '23 17:10 codeboten

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

  • receiver/httpcheck: @codeboten

See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions[bot] avatar Dec 05 '23 03:12 github-actions[bot]

@astencel-sumo will take a look this week

@codeboten is this going to be the week? 😉

andrzej-stencel avatar Dec 08 '23 10:12 andrzej-stencel

@astencel-sumo yes... sorry for the delay!

codeboten avatar Dec 08 '23 15:12 codeboten

As discussed in the Dec-13 SIG call, the plan to move this forward is to allow a configuration to filter out the http status class attribute, resulting in a single metric.

Stretch goal is to make this filtering of attribute generic enough to be used in all scrapers 😬

codeboten avatar Dec 13 '23 17:12 codeboten

@codeboten can you please take a look? I believe this issue is important, as I couldn't find a way to exclude the zero time series using a processor:

I suppose an alternative would be to add a processor to the pipeline that will filter out the zero data points. But honestly I wasn't able to find a way to filter out metric data points based on their value using either the Filter processor or Metrics Transform processor. Is this possible? 🤔

I found a way to exclude zero values with the Filter processor:

processors:
  filter:
    metrics:
      datapoint:
        - 'metric.name == "httpcheck.status" and value_int == 0'

However, this is not really what I want. I do want to get the zero values when the endpoint is down. I just want a single zero datapoint and not five. Let me rephrase the issue title to account for this.

andrzej-stencel avatar Dec 14 '23 08:12 andrzej-stencel

I think this is a workaround that makes a reasonable amount of sense to me:

  filter/drop-non-2xx-datapoints:
    metrics:
      datapoint:
        - 'metric.name == "httpcheck.status" and attributes["http.status_class"] != "2xx"'

andrzej-stencel avatar Dec 14 '23 09:12 andrzej-stencel

Here's a full example:

exporters:
  debug:
    verbosity: detailed
  prometheus:
    endpoint: localhost:1234
processors:
  filter/drop-non-2xx-datapoints:
    metrics:
      datapoint:
        - 'metric.name == "httpcheck.status" and attributes["http.status_class"] != "2xx"'
  transform/drop-status-class-attribute:
    metric_statements:
    - context: datapoint
      statements:
      - keep_keys(attributes, ["http.url", "http.status_code", "http.method"]) where metric.name == "httpcheck.status"
receivers:
  httpcheck:
    collection_interval: 3s
    targets:
    - endpoint: https://opentelemetry.io
    - endpoint: https://non.existent.address
service:
  pipelines:
    metrics:
      exporters:
      - debug
      - prometheus
      processors:
      - filter/drop-non-2xx-datapoints
      - transform/drop-status-class-attribute
      receivers:
      - httpcheck

Here's the output from the collector:

$ otelcol-contrib-0.89.0-darwin_arm64 --config config.yaml
2023-12-14T10:06:38.819+0100    info    [email protected]/telemetry.go:85 Setting up own telemetry...
2023-12-14T10:06:38.819+0100    info    [email protected]/telemetry.go:202        Serving Prometheus metrics   {"address": ":8888", "level": "Basic"}
2023-12-14T10:06:38.819+0100    info    [email protected]/exporter.go:275        Development component. May change in the future.      {"kind": "exporter", "data_type": "metrics", "name": "debug"}
2023-12-14T10:06:38.819+0100    info    [email protected]/receiver.go:296        Development component. May change in the future.      {"kind": "receiver", "name": "httpcheck", "data_type": "metrics"}
2023-12-14T10:06:38.819+0100    info    [email protected]/service.go:143  Starting otelcol-contrib...     {"Version": "0.89.0", "NumCPU": 10}
2023-12-14T10:06:38.819+0100    info    extensions/extensions.go:34     Starting extensions...
2023-12-14T10:06:38.820+0100    info    [email protected]/service.go:169  Everything is ready. Begin running and processing data.
2023-12-14T10:06:40.380+0100    info    MetricsExporter {"kind": "exporter", "data_type": "metrics", "name": "debug", "resource metrics": 1, "metrics": 3, "data points": 5}
2023-12-14T10:06:40.381+0100    info    ResourceMetrics #0
Resource SchemaURL: 
ScopeMetrics #0
ScopeMetrics SchemaURL: 
InstrumentationScope otelcol/httpcheckreceiver 0.89.0
Metric #0
Descriptor:
     -> Name: httpcheck.duration
     -> Description: Measures the duration of the HTTP check.
     -> Unit: ms
     -> DataType: Gauge
NumberDataPoints #0
Data point attributes:
     -> http.url: Str(https://non.existent.address)
StartTimestamp: 2023-12-14 09:06:38.819473 +0000 UTC
Timestamp: 2023-12-14 09:06:39.822583 +0000 UTC
Value: 5
NumberDataPoints #1
Data point attributes:
     -> http.url: Str(https://opentelemetry.io)
StartTimestamp: 2023-12-14 09:06:38.819473 +0000 UTC
Timestamp: 2023-12-14 09:06:39.822618 +0000 UTC
Value: 557
Metric #1
Descriptor:
     -> Name: httpcheck.error
     -> Description: Records errors occurring during HTTP check.
     -> Unit: {error}
     -> DataType: Sum
     -> IsMonotonic: false
     -> AggregationTemporality: Cumulative
NumberDataPoints #0
Data point attributes:
     -> http.url: Str(https://non.existent.address)
     -> error.message: Str(Get "https://non.existent.address": dial tcp: lookup non.existent.address: no such host)
StartTimestamp: 2023-12-14 09:06:38.819473 +0000 UTC
Timestamp: 2023-12-14 09:06:39.822583 +0000 UTC
Value: 1
Metric #2
Descriptor:
     -> Name: httpcheck.status
     -> Description: 1 if the check resulted in status_code matching the status_class, otherwise 0.
     -> Unit: 1
     -> DataType: Sum
     -> IsMonotonic: false
     -> AggregationTemporality: Cumulative
NumberDataPoints #0
Data point attributes:
     -> http.url: Str(https://non.existent.address)
     -> http.status_code: Int(0)
     -> http.method: Str(GET)
StartTimestamp: 2023-12-14 09:06:38.819473 +0000 UTC
Timestamp: 2023-12-14 09:06:39.822583 +0000 UTC
Value: 0
NumberDataPoints #1
Data point attributes:
     -> http.url: Str(https://opentelemetry.io)
     -> http.status_code: Int(200)
     -> http.method: Str(GET)
StartTimestamp: 2023-12-14 09:06:38.819473 +0000 UTC
Timestamp: 2023-12-14 09:06:39.822618 +0000 UTC
Value: 1
        {"kind": "exporter", "data_type": "metrics", "name": "debug"}

Here's the output from the Prometheus exporter:

$ curl localhost:1234/metrics
# HELP httpcheck_duration_milliseconds Measures the duration of the HTTP check.
# TYPE httpcheck_duration_milliseconds gauge
httpcheck_duration_milliseconds{http_url="https://non.existent.address"} 4
httpcheck_duration_milliseconds{http_url="https://opentelemetry.io"} 176
# HELP httpcheck_error Records errors occurring during HTTP check.
# TYPE httpcheck_error gauge
httpcheck_error{error_message="Get \"https://non.existent.address\": dial tcp: lookup non.existent.address: no such host",http_url="https://non.existent.address"} 1
# HELP httpcheck_status 1 if the check resulted in status_code matching the status_class, otherwise 0.
# TYPE httpcheck_status gauge
httpcheck_status{http_method="GET",http_status_code="0",http_url="https://non.existent.address"} 0
httpcheck_status{http_method="GET",http_status_code="200",http_url="https://opentelemetry.io"} 1

andrzej-stencel avatar Dec 14 '23 09:12 andrzej-stencel

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

  • receiver/httpcheck: @codeboten

See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions[bot] avatar Feb 13 '24 03:02 github-actions[bot]

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

  • receiver/httpcheck: @codeboten

See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions[bot] avatar Apr 15 '24 04:04 github-actions[bot]

This issue has been closed as inactive because it has been stale for 120 days with no activity.

github-actions[bot] avatar Jun 14 '24 05:06 github-actions[bot]