gavol
gavol
It just happened to me today for the first time (OpenWrt 21.02.3 r16554-1d4dea6d4f / LuCI openwrt-21.02 branch git-22.083.69138-0a0ce2a on Qualcomm Atheros QCA9558 ver 1 rev 0).
@caseydavenport How can I collect that information? I usually look at the API server Grafana dashboard to see what is happening there. Would the _cluster_quantile:apiserver_request_duration_seconds:histogram_quantile{}_ metrics work to get the...
That is what I get executing that query when the API servers are heavy loaded and calico is used (showing only the top calls): 
Maybe something like that: `sort_desc(sum(rate(apiserver_request_duration_seconds_sum{group="crd.projectcalico.org"}[5m]) / rate(apiserver_request_duration_seconds_count{group="crd.projectcalico.org"}[5m])) by(resource,verb,group))` is more useful.  I am sorry, but I am not very familiar with Prometheus queries.
I think that is more correct: `sort_desc((sum(rate(apiserver_request_duration_seconds_sum{group="crd.projectcalico.org"}[5m])) by(group,resource,verb) / sum(rate(apiserver_request_duration_seconds_count{group="crd.projectcalico.org"}[5m])) by(group,resource,verb))) ` 
Same problem here: big cluster (thousands of workers) with thousands of PODs and not even 128 GB of RAM are enough. I agree that is fine to enforce that min/max...