prometheus-engine icon indicating copy to clipboard operation
prometheus-engine copied to clipboard

Support for ProbeMonitoring

Open dsaintilma-flinks opened this issue 4 years ago • 12 comments

Hello,

I was wondering if adding an API endpoint for scraping Probe for exporters like blackbox or ssl is planned?

That's all. Thanks!

dsaintilma-flinks avatar Nov 26 '21 04:11 dsaintilma-flinks

We have definitely thought about that use case and want to enable it. Though we don't have a concrete design yet on how to integrate that with the current CRDs.

Our recommendation in the meantime is to run the GMP Prometheus binary as a sidecar to the blackbox_exporter with a static config that only scrapes the blackbox exporter.

fabxc avatar Dec 02 '21 12:12 fabxc

Any update on this if it's going to be picked up?

masterlittle avatar Mar 18 '23 20:03 masterlittle

Heya, we are not currently working on this. In the meantime you can use self-deployed collection to do this: https://cloud.google.com/stackdriver/docs/managed-prometheus/setup-unmanaged

lyanco avatar Mar 20 '23 15:03 lyanco

Another user asked for this today, so it might good idea to prioritise it. I am not yet sure we need Probe CR with this, as technically this could be an opinionated field in PodMonitor as well for simplicity.

Note that probing is technically possible with PodMonitors with something like:

apiVersion: monitoring.googleapis.com/v1
kind: PodMonitoring
(...)
spec:
  selector:
    matchLabels:
       (...)
  endpoints:
  - port: http
    scheme: http
    interval: 60s
    path: "/probe"
    params:
      module:
      - http_2xx
      target:
      - "http://target1/"
    metricRelabeling:
    - action: replace
      replacement: "http://target1/"
      targetLabel: probe_target
      

Two main limitations are:

  1. You have to create separate PodMonitor per target/param (port has to be unique) and no dynamic parameter mangling as described here is possible.
  2. Instance label points to blackbox exporter, not target URL, which might be annoying (as explained here). Cause: You cannot currently relabel instance label (you will get cannot relabel with action "replace" onto protected label "instance"` error). This can be mitigated by metricRelabeling suggested above for custom label.

bwplotka avatar Mar 22 '23 11:03 bwplotka

Interesting. So IIUC the main reason we can't support the probe monitoring use case in a single PodMonitoring is that the job_names are not unique if the ports are not unique within the spec.

However, if we switched to using the index of the endpoint, as prometheus-operator does, rather than the port, then this would be possible.

Aside: I'm curious then, if probe-style monitoring works with prometheus-operator's ServiceMonitor, then what does the Probe custom resource provide? A "nicer" API for probing? AFAICT it's most conventionally used for static target scraping.

pintohutch avatar May 04 '23 23:05 pintohutch

Our recommendation in the meantime is to run the GMP Prometheus binary as a sidecar to the blackbox_exporter with a static config that only scrapes the blackbox exporter.

Hi @fabxc I have GMP Prometheus running in non-GKE env and I use GCP cloud monitoring to visualize node_exporter metrics. I also want to use blackbox exporter. I'm not sure if I understand this workaround correctly. Please take a look and let me know if that's what you recommend.

  1. Managed collection that scraps node_exporter leaves as it is.
  2. New deployment blackbox-exporeter with sidecar container and config like below:
  spec:
    ...
    secrets:
    - gmp-test-sa
    containers:
    - name: prometheus
      env:
      - name: GOOGLE_APPLICATION_CREDENTIALS
        value: /gmp/key.json
      volumeMounts:
      - name: secret-gmp-test-sa
        mountPath: /gmp
        readOnly: true

plus prometheus config as configMap with

global:
  external_labels:
    project_id: PROJECT_ID
    location: REGION
    cluster: CLUSTER_NAME

And scraping settings of course

QuerQue avatar May 17 '23 17:05 QuerQue

Hi @QuerQue,

  1. Yes. Managed collection can be left as you've currently configured it for node-exporter scraping.
  2. Yes that workaround should work for deploying the GMP collector as a sidecar, provided your scrape config is properly set up.

In addition, it is possible to use PodMonitoring against blackbox-exporter, but it has limitations and is not ideal, as mentioned in the previous comment.

We're thinking of ways to better support this use case in the future and will leave this issue open to track.

Hope that helps!

pintohutch avatar May 22 '23 21:05 pintohutch

I think we would perhaps benefit from a "Probe" monitoring resource as well. Our use-case is something like the following:

  • We run highly available blackbox exporter (replicas = 5) in our gke cluster
  • We can (but don't as will be explained) configure PodMonitoring resources with endpont params and metricRelabeling magic to make managed prometheus scrap the targets behind the bb exporter. So something like,
apiVersion: monitoring.googleapis.com/v1                                                                           
kind: PodMonitoring                                                                                                
metadata:                                                                                                          
  name: foo             
spec:                                                                                                              
  selector:                                                                                                        
    matchLabels:                                                                                                   
      app: blackbox-exporter                                                                                       
  endpoints:                                                                                                       
  - port: 9115                                                                                                     
    path: "/probe"                                                                                                 
    params:                                              
      module:                                            
        - "icmp.ping"
      target:          
        - "1.2.3.4"
    interval: 120s
    timeout: 5s                                          
    metricRelabeling:                                    
      - action: replace
        sourceLabels:  
          - job      
        targetLabel: "datacenter"
        replacement: "garage"   

But, if I understand correctly, each of the N blackbox-exporter replicas would be scrapped, and our target would suffer under the increased demands. For this reason, we run self-deployed managed collectors in our cluster. Perhaps there's a way to do this I'm not aware of, but as of now, I'm of the belief that maybe? a ProbeMonitoring resource (from what it sounds) would help us.

dnck avatar Dec 07 '23 20:12 dnck

Adding ProbeMonitoring is not looking likely for this upcoming half - but - Cloud Monitoring has uptime checks (including synthetics) that do the same thing, and the resulting time series are queryable using PromQL like anything else in Cloud Monitoring. If you're looking for a fully managed solution, perhaps take a look: https://cloud.google.com/monitoring/uptime-checks/introduction

lyanco avatar Dec 07 '23 21:12 lyanco

Adding ProbeMonitoring is not looking likely for this upcoming half - but - Cloud Monitoring has uptime checks (including synthetics) that do the same thing, and the resulting time series are queryable using PromQL like anything else in Cloud Monitoring. If you're looking for a fully managed solution, perhaps take a look: https://cloud.google.com/monitoring/uptime-checks/introduction

Thanks, I'll take a look at those, but, in general, we already have a robust configuration system for managing prometheus collection with multitarget exporters. Also we like using Monarch as our tsdb. But, it would just be very nice if the gmp crds made multi cluster, multi cloud, monitoring a bit easier, and I think a ProbeMonitor might help with that. If you all are looking for contributors, I'd be happy to take a stab at it.

dnck avatar Dec 12 '23 23:12 dnck

We absolutely welcome contributors @dnck! We're happy to review or collaborate on any designs

pintohutch avatar Dec 13 '23 01:12 pintohutch

FYI @dnck - we have a PoC that @TheSpiritXIII and @bernot-dev put together in #766.

We'll prioritize rolling this out in the near future as a supported offering. PTAL there if you have any input!

pintohutch avatar Feb 15 '24 15:02 pintohutch

After some discussion, we recommend using Uptime to meet these needs. Please let us know if you would like to revisit this in the future.

bernot-dev avatar May 15 '24 15:05 bernot-dev