prometheus-engine Configurable timeout on prom-frontend

Hello,

We have recently switched to using the datasource-syncer instead of prom-frontend for our Grafana instance to take advantage of higher query timeouts. Is it possible to do something similar for the prom-adapter? We are running into the 30s query timeout in prom-frontend right now, and would like to take advantage of the 120s timeout of querying GMP directly. Is this something that we could do?

Thanks!

Feb 07 '25 15:02 gravelg

Is it possible to do something similar for the prom-adapter?

Do you mean prom-frontend?

We can consider customized timeout for the frontend. Looks like it's hardcoded to 30s indeed (behind the http.DefaultTransport).

However, we wanted to deprecate frontend, given datasource-syncer exists. Could you help us understand what's the use case for frontend if you use datasource-syncer? (:

Thanks!

Feb 17 '25 17:02 bwplotka

Hey @bwplotka, totally understand wanting to deprecate prom-frontend, however it seems that datasource-syncer does not support updating the prom-adapter datasource URL like it does for Grafana. Because of this, even if we use datasource-syncer for Grafana, we have to continue running prom-frontend to support prom-adapter.

I'd be happy to get rid of the frontend all together if you can provide me instructions on using datasource-syncer with prom-adapter, but all the docs I could find (https://cloud.google.com/stackdriver/docs/managed-prometheus/hpa#promethueus-adapter) use the frontend.

Thank you!

Feb 17 '25 17:02 gravelg

Got it, thanks. Note that prom-adapter also has a hardcoded timeout AFAIK.

In this case we have two routes:

A) Add timeout setting for both frontend and prom-adapter (adding flag in code) B) Add timeout setting prom-adapter only and add Datasource syncer support for it too. C) Add timeout setting prom-adapter only and add Google Oauth2 support for it too.

C might be easiest in some way 🤔

Help wanted, but we might want to add an issue on the adapter for it.

Feb 17 '25 17:02 bwplotka

To solve all of this -- is there a way to simplify queries? Waiting 30s+ for autoscaling can be painful in itself (assuming you use prom-adapter for autoscaling reasons).

Feb 17 '25 17:02 bwplotka

I imagine if your team is looking to get rid of frontend completely, then C is the right approach I imagine. Happy to try and knock out a PR if you got an example of how Google Oauth2 works in another component?

In the meantime we have tried simplifying queries (the timeout issue manifests itself when we have a deployment that can scale up to 200 replicas and I assume generates selector queries like pod=~pod1|pod2|...|pod200. We are trying to remove labels but it's also not clear exactly what query is getting generated and it doesn't seem like there is a setting on the adapter to lower the log level.

Feb 18 '25 16:02 gravelg

Drive-by comment - it could be worth trying KEDA or the custom-metrics-stackdriver-adapter to see if that helps things.

Feb 19 '25 14:02 pintohutch

Kinda related: found a dead PR that is trying to solve what we think is causing the timeouts https://github.com/kubernetes-sigs/prometheus-adapter/pull/670

Feb 19 '25 14:02 gravelg