opentelemetry-collector-contrib
opentelemetry-collector-contrib copied to clipboard
`scrape_config_files` doesn't work
Component(s)
receiver/prometheus
What happened?
Description
I want to use scrape_config_files to add prometheus job. However, I found that the prometheus job defined in this way are not effective, even though the OTel configuration has already been applied. For details on applying, see applyConfig
Steps to Reproduce
1、add the scrape_config_files in the configuration of prometheus receiver, with the file path specified as scrape_files.yaml.
2、add the appropriate scrape_configs entries to scrape_files.yaml.
3、start OTel
Expected Result
The prometheus job in scrape_files.yaml are executing correctly.
Actual Result
The prometheus job does not work, and the OTel logs do not display the added job.
Collector version
v0.95
Environment information
Environment
OS: darwin/arm64 Compiler: go1.21.9
The same issue occurs when deployed on Kubernetes (k8s)
OpenTelemetry Collector configuration
##### otel.yaml
exporters:
otlphttp/metric:
metrics_endpoint: http://localhost:8080
retry_on_failure:
initial_interval: 5s
max_interval: 30s
max_elapsed_time: 300s
multiplier: 2
randomization_factor: 0.5
extensions:
pprof:
health_check:
endpoint: 0.0.0.0:13133
memory_ballast:
size_mib: "256"
processors:
batch/metrics:
send_batch_size: 500
send_batch_max_size: 500
timeout: 5s
memory_limiter:
check_interval: 1s
limit_mib: 1024
cumulativetodelta:
receivers:
prometheus:
trim_metric_suffixes: false
config:
scrape_config_files:
- /scrape_files.yaml
scrape_configs:
- job_name: 'otel-scrape-self-test'
scrape_interval: 10s
scrape_timeout: 10s
metrics_path: '/metrics'
static_configs:
- targets: ['0.0.0.0:8888']
service:
telemetry:
metrics:
level: detailed
address: 0.0.0.0:8888
extensions:
- pprof
- health_check
- memory_ballast
pipelines:
metrics/prometheus:
receivers:
- prometheus
processors:
- memory_limiter
- cumulativetodelta
- batch/metrics
exporters:
- otlphttp/metric
##### scrape_files.yaml
scrape_configs:
- job_name: 'otel-scrape-k8s-apiserver'
scrape_interval: 10s
scrape_timeout: 10s
body_size_limit: 50MB
follow_redirects: true
scheme: https
metrics_path: /metrics
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- default
relabel_configs:
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: kubernetes
replacement: $1
action: keep
- action: replace
target_label: otel_pod
replacement: otel_1
- job_name: 'otel-scrape-self'
scrape_interval: 10s
scrape_timeout: 10s
metrics_path: '/metrics'
static_configs:
- targets: ['0.0.0.0:9999']
Log output
2024-08-21T18:49:59.184+0800 info [email protected]/service.go:143 Starting otelcontribcol... {"Version": "0.95.0-dev", "NumCPU": 8}
2024-08-21T18:49:59.184+0800 info extensions/extensions.go:34 Starting extensions...
2024-08-21T18:49:59.184+0800 info extensions/extensions.go:37 Extension is starting... {"kind": "extension", "name": "pprof"}
2024-08-21T18:49:59.185+0800 info pprofextension/pprofextension.go:60 Starting net/http/pprof server {"kind": "extension", "name": "pprof", "config": {"TCPAddr":{"Endpoint":"localhost:1777","DialerConfig":{"Timeout":0}},"BlockProfileFraction":0,"MutexProfileFraction":0,"SaveToFile":""}}
2024-08-21T18:49:59.185+0800 info extensions/extensions.go:52 Extension started. {"kind": "extension", "name": "pprof"}
2024-08-21T18:49:59.185+0800 info extensions/extensions.go:37 Extension is starting... {"kind": "extension", "name": "memory_ballast"}
2024-08-21T18:49:59.187+0800 info [email protected]/memory_ballast.go:41 Setting memory ballast {"kind": "extension", "name": "memory_ballast", "MiBs": 256}
2024-08-21T18:49:59.188+0800 info extensions/extensions.go:52 Extension started. {"kind": "extension", "name": "memory_ballast"}
2024-08-21T18:49:59.188+0800 info extensions/extensions.go:37 Extension is starting... {"kind": "extension", "name": "health_check"}
2024-08-21T18:49:59.188+0800 info healthcheckextension/healthcheckextension.go:35 Starting health_check extension {"kind": "extension", "name": "health_check", "config": {"Endpoint":"0.0.0.0:13134","TLSSetting":null,"CORS":null,"Auth":null,"MaxRequestBodySize":0,"IncludeMetadata":false,"ResponseHeaders":null,"Path":"/","ResponseBody":null,"CheckCollectorPipeline":{"Enabled":false,"Interval":"5m","ExporterFailureThreshold":5}}}
2024-08-21T18:49:59.189+0800 warn [email protected]/warning.go:42 Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks. Enable the feature gate to change the default and remove this warning. {"kind": "extension", "name": "health_check", "documentation": "https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks", "feature gate ID": "component.UseLocalHostAsDefaultHost"}
2024-08-21T18:49:59.189+0800 info extensions/extensions.go:52 Extension started. {"kind": "extension", "name": "health_check"}
2024-08-21T18:49:59.190+0800 info prometheusreceiver/metrics_receiver.go:240 Starting discovery manager {"kind": "receiver", "name": "prometheus", "data_type": "metrics"}
2024-08-21T18:50:04.422+0800 info prometheusreceiver/metrics_receiver.go:231 Scrape job added {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "jobName": "otel-scrape-self-test"}
2024-08-21T18:50:04.422+0800 info healthcheck/handler.go:132 Health Check state change {"kind": "extension", "name": "health_check", "status": "ready"}
2024-08-21T18:50:04.422+0800 info [email protected]/service.go:169 Everything is ready. Begin running and processing data.
2024-08-21T18:50:04.422+0800 warn localhostgate/featuregate.go:63 The default endpoints for all servers in components will change to use localhost instead of 0.0.0.0 in a future version. Use the feature gate to preview the new default. {"feature gate ID": "component.UseLocalHostAsDefaultHost"}
2024-08-21T18:50:04.422+0800 info prometheusreceiver/metrics_receiver.go:282 Starting scrape manager {"kind": "receiver", "name": "prometheus", "data_type": "metrics"}
2024-08-21T18:50:14.480+0800 info exporterhelper/retry_sender.go:118 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "metrics", "name": "otlphttp/metric", "error": "failed to make an HTTP request: Post \"https://otel-inner.yuanfudao.biz/metric/otel/v1\": dial tcp: lookup otel-inner.yuanfudao.biz: no such host", "interval": "2.623812628s"}
Additional context
No response
Pinging code owners:
- receiver/prometheus: @Aneurysm9 @dashpole
See Adding Labels via Comments if you do not have permissions to add labels yourself.
My best guess is that we don't apply the config to the discovery manager here: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/8477a83afdfc8750d471cdb5b4af2fb227bc8423/receiver/prometheusreceiver/metrics_receiver.go#L366
We iterate over cfg.ScrapeConfigs, rather than cfg.GetScrapeConfigs(), which incorporates configuration from scrape_config_files. We should update most usages of cfg.ScrapeConfigs to use the newer function
My best guess is that we don't apply the config to the discovery manager here:
https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/8477a83afdfc8750d471cdb5b4af2fb227bc8423/receiver/prometheusreceiver/metrics_receiver.go#L366
We iterate over
cfg.ScrapeConfigs, rather thancfg.GetScrapeConfigs(), which incorporates configuration from scrape_config_files. We should update most usages of cfg.ScrapeConfigs to use the newer function
Thank you for your answer. It worked after adding cfg.ScrapeConfigs, _ = (*config.Config)(cfg).GetScrapeConfigs() before the code mentioned above.
The only drawback is that OTel cannot replace the environment variables in the scrape_files.yaml.
If this issue is still available, I'd be happy to work on a fix for that
It is all yours @bacherfl. Please cc me on the PR and i'll review
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.
Pinging code owners:
- receiver/prometheus: @Aneurysm9 @dashpole
See Adding Labels via Comments if you do not have permissions to add labels yourself.
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.
Pinging code owners:
- receiver/prometheus: @Aneurysm9 @dashpole
See Adding Labels via Comments if you do not have permissions to add labels yourself.