Expose or proxy internal IPs of workers for Prometheus monitoring
In some deployments, like Fargate container deploys, it can be impossible for an external Prometheus monitoring host to directly reach workers on their published, yet private/internal network, IPs.
It would be useful if the scheduler or a process alongside the scheduler, perhaps using the "sidecar container pattern," could proxy requests to the workers from public IP/ports.
Today, I faced with the same issue. Dask workers in GCP does not listen on external IP, so it is not easy to discover the Dask workers with the Prometheus default discovery job. Looking into different solution to force worker listen on 0.0.0.0
@adbreind, the solution is to set proper settings for workers. Here is a solution for GCP:
from dask_cloudprovider.gcp import GCPCluster
cluster = GCPCluster(
worker_options = {
"dashboard_address": "0.0.0.0:8787",
},
)
Now, the worker will listen on a public IP, so Prometheus will be able to access the /metrics links directly from the instance. Be cautious not to expose the port to the internet, and ensure that the firewall rules are properly set up.
Here is a possible Prometheus configuration:
global:
scrape_interval: 5s
scrape_configs:
- job_name: "dask-gce"
metrics_path: "/metrics" # Default is /metrics, but explicit here
gce_sd_configs:
- project: "your-gcp-project-id" # Your GCP project :contentReference[oaicite:4]{index=4}
zone: "us-central1-a" # Your GCE zone
filter: 'labels.container_vm = "dask-cloudprovider"'
# Only instances with this default dask-cloudprovider's label
port: 8787 # Dask’s HTTP status port
refresh_interval: "5s" # Refresh every 5 seconds
relabel_configs:
- source_labels: [__meta_gce_public_ip]
regex: "(.+)"
target_label: "__address__"
replacement: "${1}:8787" # Assemble <public_ip>:8787 for scraping
Similary it can be applied for AWS.