Configurable timeout on prom-frontend
Hello,
We have recently switched to using the datasource-syncer instead of prom-frontend for our Grafana instance to take advantage of higher query timeouts. Is it possible to do something similar for the prom-adapter? We are running into the 30s query timeout in prom-frontend right now, and would like to take advantage of the 120s timeout of querying GMP directly. Is this something that we could do?
Thanks!
Is it possible to do something similar for the prom-adapter?
Do you mean prom-frontend?
We can consider customized timeout for the frontend. Looks like it's hardcoded to 30s indeed (behind the http.DefaultTransport).
However, we wanted to deprecate frontend, given datasource-syncer exists. Could you help us understand what's the use case for frontend if you use datasource-syncer? (:
Thanks!
Hey @bwplotka, totally understand wanting to deprecate prom-frontend, however it seems that datasource-syncer does not support updating the prom-adapter datasource URL like it does for Grafana. Because of this, even if we use datasource-syncer for Grafana, we have to continue running prom-frontend to support prom-adapter.
I'd be happy to get rid of the frontend all together if you can provide me instructions on using datasource-syncer with prom-adapter, but all the docs I could find (https://cloud.google.com/stackdriver/docs/managed-prometheus/hpa#promethueus-adapter) use the frontend.
Thank you!
Got it, thanks. Note that prom-adapter also has a hardcoded timeout AFAIK.
In this case we have two routes:
A) Add timeout setting for both frontend and prom-adapter (adding flag in code) B) Add timeout setting prom-adapter only and add Datasource syncer support for it too. C) Add timeout setting prom-adapter only and add Google Oauth2 support for it too.
C might be easiest in some way 🤔
Help wanted, but we might want to add an issue on the adapter for it.
To solve all of this -- is there a way to simplify queries? Waiting 30s+ for autoscaling can be painful in itself (assuming you use prom-adapter for autoscaling reasons).
I imagine if your team is looking to get rid of frontend completely, then C is the right approach I imagine. Happy to try and knock out a PR if you got an example of how Google Oauth2 works in another component?
In the meantime we have tried simplifying queries (the timeout issue manifests itself when we have a deployment that can scale up to 200 replicas and I assume generates selector queries like pod=~pod1|pod2|...|pod200. We are trying to remove labels but it's also not clear exactly what query is getting generated and it doesn't seem like there is a setting on the adapter to lower the log level.
Drive-by comment - it could be worth trying KEDA or the custom-metrics-stackdriver-adapter to see if that helps things.
Kinda related: found a dead PR that is trying to solve what we think is causing the timeouts https://github.com/kubernetes-sigs/prometheus-adapter/pull/670