Log scrape errors
Request
It would be great if Alloy provided an option (enabled by default may be good idea) for logging prometheus.scrape errors for targets.
This could lead to a lot of logs, so some level of rate limiting may be necessary. But often users see scrapes failing and don't see the reason for them.
Use case
Finding scrape failure reason.
The failure reason can currently be discovered in the UI, but the experience of this is poor, especially in a large cluster. The UI can be improved, but seems like logging is still a good idea, even if UI would have a nice tool for finding target status in the cluster.
Please correct me if I am wrong but after a brief look into what it would take to add these kind of logs this is what I have found so far:
- Every scrape job is handled by a scrape.Manager that is coming from Prometheus project. This one runs the scrap loops and internally record scrape errors.
- There is already options to log scrape errors to a file and we use the JSON logger from Prometheus for this but we need to configure the file name for this https://github.com/grafana/alloy/blob/main/internal/component/prometheus/scrape/scrape.go#L82
Possible solutions:
- Run a job that periodically check for any recorded errors on scrape targets and logs them. Not sure this would be the best solutions because it would add more read locks to targets
- Reuse the mechanism that is already there to log scrape errors. We don't have to supply a logger that logs to file here but we need to set ScrapeFailureLogFile to something for it to work. Not sure how you would combine both settings i.e. log scrape errors to std out and file if configured.
- I think this should should also be possible to setup without any code changes with
prometheus.scrape "scrape" {
// ... other config
scrape_failure_log_file = "some-file.log"
}
local.file_match "local_files" {
path_targets = [{"__path__" = "some-file.log"}]
sync_period = "5s"
}
// other config to to relabel / push logs
This issue has not had any activity in the past 30 days, so the needs-attention label has been added to it.
If the opened issue is a bug, check to see if a newer release fixed your issue. If it is no longer relevant, please feel free to close this issue.
The needs-attention label signals to maintainers that something has fallen through the cracks. No action is needed by you; your issue will be kept open and you do not have to respond to this comment. The label will be removed the next time this job runs if there is new activity.
Thank you for your contributions!
Alloy seems to log those errors not, but only on a debug level:
monitoring-alloy-metrics-1 alloy ts=2025-08-04T10:08:40.394056755Z level=debug msg="Scrape failed" component_path=/prometheus_operator_objects.feature component_id=prometheus.operator.podmonitors.pod_monitors scrape_pool=podMonitor/kube-system/coredns/0 target=http://10.2.114.110:9153/metrics err="Get \"http://10.2.114.110:9153/metrics\": context deadline exceeded"
It would be great to also seem them in the alloy UI, just like in prometheus /targets UI. Currently the UI only exposes the scrape config and object health, but not the scrape health or scrape error message.
