gavol

Results 6 comments of gavol

It just happened to me today for the first time (OpenWrt 21.02.3 r16554-1d4dea6d4f / LuCI openwrt-21.02 branch git-22.083.69138-0a0ce2a on Qualcomm Atheros QCA9558 ver 1 rev 0).

@caseydavenport How can I collect that information? I usually look at the API server Grafana dashboard to see what is happening there. Would the _cluster_quantile:apiserver_request_duration_seconds:histogram_quantile{}_ metrics work to get the...

That is what I get executing that query when the API servers are heavy loaded and calico is used (showing only the top calls): ![immagine](https://user-images.githubusercontent.com/34028214/156559528-fec35ade-4a1c-409c-8905-d51dac473fba.png)

Maybe something like that: `sort_desc(sum(rate(apiserver_request_duration_seconds_sum{group="crd.projectcalico.org"}[5m]) / rate(apiserver_request_duration_seconds_count{group="crd.projectcalico.org"}[5m])) by(resource,verb,group))` is more useful. ![immagine](https://user-images.githubusercontent.com/34028214/156581370-043fb55c-9832-427f-8a99-df299a61f588.png) I am sorry, but I am not very familiar with Prometheus queries.

I think that is more correct: `sort_desc((sum(rate(apiserver_request_duration_seconds_sum{group="crd.projectcalico.org"}[5m])) by(group,resource,verb) / sum(rate(apiserver_request_duration_seconds_count{group="crd.projectcalico.org"}[5m])) by(group,resource,verb))) ` ![immagine](https://user-images.githubusercontent.com/34028214/156586894-db14682d-6691-4ca2-819a-1963a5b6be83.png)

Same problem here: big cluster (thousands of workers) with thousands of PODs and not even 128 GB of RAM are enough. I agree that is fine to enforce that min/max...