kubeapps icon indicating copy to clipboard operation
kubeapps copied to clipboard

Improve response times in Kubeapps APIs when using Pinniped-proxy

Open castelblanque opened this issue 2 years ago • 11 comments

Summary Some requests (e.g. core.packages.v1alpha1.PackagesService.GetAvailablePackageSummaries) are slow when using pinniped-proxy. Investigate and improve those times.

Description After work done for #5362 and #4919, response times have improved due to not creating unused multiple K8s Go clients. However, there is still what seems to be a throttling problem when using pinniped-proxy. Multiple requests are done from kubeapps-apis to K8s API (with correct QPS and Burst values), but requests go through pinniped-proxy. Log from the proxy:

2022-09-28T13:48:38.784778 [INFO] - GET https://kubernetes.default/apis/flowcontrol.apiserver.k8s.io/v1beta2?timeout=32s 200 OK
2022-09-28T13:48:38.787947 [INFO] - GET https://kubernetes.default/apis/flowcontrol.apiserver.k8s.io/v1beta1?timeout=32s 200 OK
2022-09-28T13:48:38.788917 [INFO] - GET https://kubernetes.default/apis/storage.k8s.io/v1?timeout=32s 200 OK
2022-09-28T13:48:38.789350 [INFO] - GET https://kubernetes.default/apis/kubeapps.com/v1alpha1?timeout=32s 200 OK
2022-09-28T13:48:38.880150 [INFO] - GET https://kubernetes.default/apis/authorization.k8s.io/v1?timeout=32s 200 OK
2022-09-28T13:48:38.880827 [INFO] - GET https://kubernetes.default/apis/apiregistration.k8s.io/v1?timeout=32s 200 OK
2022-09-28T13:48:38.885347 [INFO] - GET https://kubernetes.default/apis/source.toolkit.fluxcd.io/v1beta2?timeout=32s 200 OK
2022-09-28T13:48:38.885365 [INFO] - GET https://kubernetes.default/apis/identity.concierge.pinniped.dev/v1alpha1?timeout=32s 200 OK
2022-09-28T13:48:38.886346 [INFO] - GET https://kubernetes.default/apis/image.toolkit.fluxcd.io/v1beta1?timeout=32s 200 OK
2022-09-28T13:48:38.887521 [INFO] - GET https://kubernetes.default/apis/kustomize.toolkit.fluxcd.io/v1beta1?timeout=32s 200 OK
2022-09-28T13:48:38.887959 [INFO] - GET https://kubernetes.default/apis/networking.k8s.io/v1?timeout=32s 200 OK
2022-09-28T13:48:38.888758 [INFO] - GET https://kubernetes.default/apis/coordination.k8s.io/v1?timeout=32s 200 OK
2022-09-28T13:48:38.889212 [INFO] - GET https://kubernetes.default/apis/events.k8s.io/v1?timeout=32s 200 OK
2022-09-28T13:48:38.980185 [INFO] - GET https://kubernetes.default/apis/policy/v1?timeout=32s 200 OK
2022-09-28T13:48:38.981594 [INFO] - GET https://kubernetes.default/apis/node.k8s.io/v1?timeout=32s 200 OK
2022-09-28T13:48:38.982572 [INFO] - GET https://kubernetes.default/api/v1?timeout=32s 200 OK
2022-09-28T13:48:38.984052 [INFO] - GET https://kubernetes.default/apis/batch/v1?timeout=32s 200 OK
2022-09-28T13:48:38.984456 [INFO] - GET https://kubernetes.default/apis/helm.toolkit.fluxcd.io/v2beta1?timeout=32s 200 OK
2022-09-28T13:48:38.984862 [INFO] - GET https://kubernetes.default/apis/rbac.authorization.k8s.io/v1?timeout=32s 200 OK
2022-09-28T13:48:38.984924 [INFO] - GET https://kubernetes.default/apis/login.concierge.pinniped.dev/v1alpha1?timeout=32s 200 OK
2022-09-28T13:48:38.985329 [INFO] - GET https://kubernetes.default/apis/config.concierge.pinniped.dev/v1alpha1?timeout=32s 200 OK
2022-09-28T13:48:38.985651 [INFO] - GET https://kubernetes.default/apis/authentication.concierge.pinniped.dev/v1alpha1?timeout=32s 200 OK
2022-09-28T13:48:38.986150 [INFO] - GET https://kubernetes.default/apis/image.toolkit.fluxcd.io/v1alpha1?timeout=32s 200 OK
2022-09-28T13:48:38.986452 [INFO] - GET https://kubernetes.default/apis/image.toolkit.fluxcd.io/v1alpha2?timeout=32s 200 OK
2022-09-28T13:48:39.080254 [INFO] - GET https://kubernetes.default/apis/scheduling.k8s.io/v1?timeout=32s 200 OK
2022-09-28T13:48:39.081032 [INFO] - GET https://kubernetes.default/apis/admissionregistration.k8s.io/v1?timeout=32s 200 OK
2022-09-28T13:48:39.081128 [INFO] - GET https://kubernetes.default/apis/apps/v1?timeout=32s 200 OK
2022-09-28T13:48:39.083395 [INFO] - GET https://kubernetes.default/apis/apiextensions.k8s.io/v1?timeout=32s 200 OK
2022-09-28T13:48:39.084938 [INFO] - GET https://kubernetes.default/apis/storage.k8s.io/v1beta1?timeout=32s 200 OK
2022-09-28T13:48:39.086317 [INFO] - GET https://kubernetes.default/apis/autoscaling/v2?timeout=32s 200 OK
2022-09-28T13:48:39.088025 [INFO] - GET https://kubernetes.default/apis/autoscaling/v1?timeout=32s 200 OK
2022-09-28T13:48:39.180407 [INFO] - GET https://kubernetes.default/apis/notification.toolkit.fluxcd.io/v1beta1?timeout=32s 200 OK
2022-09-28T13:48:39.180475 [INFO] - GET https://kubernetes.default/apis/kustomize.toolkit.fluxcd.io/v1beta2?timeout=32s 200 OK
2022-09-28T13:48:39.181346 [INFO] - GET https://kubernetes.default/apis/certificates.k8s.io/v1?timeout=32s 200 OK
2022-09-28T13:48:39.182219 [INFO] - GET https://kubernetes.default/apis/authentication.k8s.io/v1?timeout=32s 200 OK
2022-09-28T13:48:39.183452 [INFO] - GET https://kubernetes.default/apis/source.toolkit.fluxcd.io/v1beta1?timeout=32s 200 OK
2022-09-28T13:48:39.186020 [INFO] - GET https://kubernetes.default/apis/discovery.k8s.io/v1?timeout=32s 200 OK
2022-09-28T13:48:39.187938 [INFO] - GET https://kubernetes.default/apis/autoscaling/v2beta2?timeout=32s 200 OK

When kubeapps-apis is configured without pinniped-proxy, response times for those requests are way faster. This might indicate that there is a throttling problem from pinniped-proxy towards the actual K8s API server.

Acceptance criteria

  • Response times for Kubeapps APIs is improved when using pinniped-proxy., to similar times obtained without pinniped-proxy.

castelblanque avatar Sep 28 '22 15:09 castelblanque

Thanks @castelblanque . Yeah, that's intriguing given that we're caching the credential. I'll take a look.

absoludity avatar Oct 12 '22 04:10 absoludity

So I started looking at this today - and it seems quite an interesting problem/bug. For what it's worth, I don't think it's related to throttling of requests to the kubernetes-api server from pinniped-proxy, but rather, an issue in the way async requests are handled by pinniped-proxy itself.

In summary, tokio, the asynchronous runtime used with hyper to provide many concurrent tasks per CPU thread (similar to go-routines, just not built in to the language), creates a new task for each request as it arrives, then marks the task as idle when it is waiting for i/o (in our case, a network request for the proxied response). When the network request is completed, it is marked as runnable again, and should be put into the front of the queue (the next-task slot), but seems to be put on the back of the queue.

Let's see this in the detail: I added a few extra logging statements during the proxy operation so we can tell when the request begins, when the id/cert exchange happens and when the actual downstream request begins (or see this gist for logging with trace of tokio's mio crate )

The first three queries are proxied and returned (last being the /apis) and each takes between 30-40ms:

2022-10-17T03:25:43.657974 [INFO] - Beginning proxy request for https://kubernetes.default/api?timeout=32s
2022-10-17T03:25:43.659170 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:43.659756 [INFO] - Beginning proxy request for https://kubernetes.default/apis/authorization.k8s.io/v1/selfsubjectaccessreviews
2022-10-17T03:25:43.659807 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:43.682590 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:43.687346 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:43.692050 [INFO] - GET https://kubernetes.default/api?timeout=32s 200 OK
2022-10-17T03:25:43.692896 [INFO] - Beginning proxy request for https://kubernetes.default/apis?timeout=32s
2022-10-17T03:25:43.692962 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:43.701520 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:43.702344 [INFO] - POST https://kubernetes.default/apis/authorization.k8s.io/v1/selfsubjectaccessreviews 201 Created
2022-10-17T03:25:43.707072 [INFO] - GET https://kubernetes.default/apis?timeout=32s 200 OK

It is this response which, when returned upstream to the kubeapps-apis service (which is building the dynamic client and trying to dynamically determine all the available apis as part of the REST mapper creation), the barrage of requests arrives for all the available APIs. The important point is that each request is creating a new task and it appears that all these tasks are being processed FIFO, apparently not concurrently (?!), without allowing any responses back until all the new request tasks have been marked as idle:

2022-10-17T03:25:43.708332 [INFO] - Beginning proxy request for https://kubernetes.default/api/v1?timeout=32s
2022-10-17T03:25:43.708410 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:43.716813 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:43.717562 [INFO] - Beginning proxy request for https://kubernetes.default/apis/events.k8s.io/v1?timeout=32s
2022-10-17T03:25:43.717602 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:43.726343 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:43.727344 [INFO] - Beginning proxy request for https://kubernetes.default/apis/coordination.k8s.io/v1?timeout=32s
2022-10-17T03:25:43.727406 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:43.811001 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:43.812148 [INFO] - Beginning proxy request for https://kubernetes.default/apis/policy/v1?timeout=32s
2022-10-17T03:25:43.812213 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:43.824686 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:43.825403 [INFO] - Beginning proxy request for https://kubernetes.default/apis/events.k8s.io/v1beta1?timeout=32s
2022-10-17T03:25:43.825441 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:43.913587 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:43.914525 [INFO] - Beginning proxy request for https://kubernetes.default/apis/rbac.authorization.k8s.io/v1?timeout=32s
2022-10-17T03:25:43.914582 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:43.923789 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:43.924695 [INFO] - Beginning proxy request for https://kubernetes.default/apis/login.concierge.pinniped.dev/v1alpha1?timeout=32s
2022-10-17T03:25:43.924740 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.009504 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.010713 [INFO] - Beginning proxy request for https://kubernetes.default/apis/image.toolkit.fluxcd.io/v1beta1?timeout=32s
2022-10-17T03:25:44.010772 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.019752 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.020884 [INFO] - Beginning proxy request for https://kubernetes.default/apis/authentication.concierge.pinniped.dev/v1alpha1?timeout=32s
2022-10-17T03:25:44.020932 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.107160 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.108268 [INFO] - Beginning proxy request for https://kubernetes.default/apis/flowcontrol.apiserver.k8s.io/v1beta1?timeout=32s
2022-10-17T03:25:44.108326 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.117917 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.118979 [INFO] - Beginning proxy request for https://kubernetes.default/apis/autoscaling/v2beta2?timeout=32s
2022-10-17T03:25:44.119037 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.204251 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.205376 [INFO] - Beginning proxy request for https://kubernetes.default/apis/networking.k8s.io/v1?timeout=32s
2022-10-17T03:25:44.205427 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.214138 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.215287 [INFO] - Beginning proxy request for https://kubernetes.default/apis/autoscaling/v2beta1?timeout=32s
2022-10-17T03:25:44.215355 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.224067 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.225180 [INFO] - Beginning proxy request for https://kubernetes.default/apis/policy/v1beta1?timeout=32s
2022-10-17T03:25:44.225219 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.309400 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.310438 [INFO] - Beginning proxy request for https://kubernetes.default/apis/config.concierge.pinniped.dev/v1alpha1?timeout=32s
2022-10-17T03:25:44.310487 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.319451 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.320575 [INFO] - Beginning proxy request for https://kubernetes.default/apis/authentication.k8s.io/v1?timeout=32s
2022-10-17T03:25:44.320613 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.405115 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.406333 [INFO] - Beginning proxy request for https://kubernetes.default/apis/identity.concierge.pinniped.dev/v1alpha1?timeout=32s
2022-10-17T03:25:44.406387 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.415010 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.416044 [INFO] - Beginning proxy request for https://kubernetes.default/apis/discovery.k8s.io/v1?timeout=32s
2022-10-17T03:25:44.416089 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.424744 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.426010 [INFO] - Beginning proxy request for https://kubernetes.default/apis/autoscaling/v1?timeout=32s
2022-10-17T03:25:44.501688 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.510537 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.511581 [INFO] - Beginning proxy request for https://kubernetes.default/apis/image.toolkit.fluxcd.io/v1alpha1?timeout=32s
2022-10-17T03:25:44.511631 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.520423 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.521591 [INFO] - Beginning proxy request for https://kubernetes.default/apis/admissionregistration.k8s.io/v1?timeout=32s
2022-10-17T03:25:44.521640 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.606678 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.608375 [INFO] - Beginning proxy request for https://kubernetes.default/apis/apiregistration.k8s.io/v1?timeout=32s
2022-10-17T03:25:44.608436 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.617552 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.618658 [INFO] - Beginning proxy request for https://kubernetes.default/apis/batch/v1?timeout=32s
2022-10-17T03:25:44.618710 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.628218 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.629265 [INFO] - Beginning proxy request for https://kubernetes.default/apis/helm.toolkit.fluxcd.io/v2beta1?timeout=32s
2022-10-17T03:25:44.629315 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.711106 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.712410 [INFO] - Beginning proxy request for https://kubernetes.default/apis/discovery.k8s.io/v1beta1?timeout=32s
2022-10-17T03:25:44.712458 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.721111 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.801966 [INFO] - Beginning proxy request for https://kubernetes.default/apis/node.k8s.io/v1beta1?timeout=32s
2022-10-17T03:25:44.802032 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.811083 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.812333 [INFO] - Beginning proxy request for https://kubernetes.default/apis/node.k8s.io/v1?timeout=32s
2022-10-17T03:25:44.812383 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.821086 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.822338 [INFO] - Beginning proxy request for https://kubernetes.default/apis/storage.k8s.io/v1beta1?timeout=32s
2022-10-17T03:25:44.822397 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.907995 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.909164 [INFO] - Beginning proxy request for https://kubernetes.default/apis/apps/v1?timeout=32s
2022-10-17T03:25:44.909213 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.917759 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.918973 [INFO] - Beginning proxy request for https://kubernetes.default/apis/scheduling.k8s.io/v1?timeout=32s
2022-10-17T03:25:44.919030 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:45.005963 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:45.007736 [INFO] - Beginning proxy request for https://kubernetes.default/apis/autoscaling/v2?timeout=32s
2022-10-17T03:25:45.007832 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:45.020117 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:45.021137 [INFO] - Beginning proxy request for https://kubernetes.default/apis/authorization.k8s.io/v1?timeout=32s
2022-10-17T03:25:45.021178 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:45.105791 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:45.107111 [INFO] - Beginning proxy request for https://kubernetes.default/apis/source.toolkit.fluxcd.io/v1beta1?timeout=32s
2022-10-17T03:25:45.107169 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:45.115979 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:45.117122 [INFO] - Beginning proxy request for https://kubernetes.default/apis/storage.k8s.io/v1?timeout=32s
2022-10-17T03:25:45.117171 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:45.205532 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:45.207104 [INFO] - Beginning proxy request for https://kubernetes.default/apis/flowcontrol.apiserver.k8s.io/v1beta2?timeout=32s
2022-10-17T03:25:45.207180 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:45.222913 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:45.224620 [INFO] - Beginning proxy request for https://kubernetes.default/apis/certificates.k8s.io/v1?timeout=32s
2022-10-17T03:25:45.224708 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:45.308181 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:45.309547 [INFO] - Beginning proxy request for https://kubernetes.default/apis/kustomize.toolkit.fluxcd.io/v1beta1?timeout=32s
2022-10-17T03:25:45.309600 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:45.318831 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:45.320041 [INFO] - Beginning proxy request for https://kubernetes.default/apis/source.toolkit.fluxcd.io/v1beta2?timeout=32s
2022-10-17T03:25:45.320090 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:45.405111 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:45.406595 [INFO] - Beginning proxy request for https://kubernetes.default/apis/notification.toolkit.fluxcd.io/v1beta1?timeout=32s
2022-10-17T03:25:45.406644 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:45.415655 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:45.416844 [INFO] - Beginning proxy request for https://kubernetes.default/apis/batch/v1beta1?timeout=32s
2022-10-17T03:25:45.416886 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:45.425878 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:45.502684 [INFO] - Beginning proxy request for https://kubernetes.default/apis/image.toolkit.fluxcd.io/v1alpha2?timeout=32s
2022-10-17T03:25:45.502744 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:45.512051 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:45.513227 [INFO] - Beginning proxy request for https://kubernetes.default/apis/kustomize.toolkit.fluxcd.io/v1beta2?timeout=32s
2022-10-17T03:25:45.513272 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:45.525748 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:45.602727 [INFO] - Beginning proxy request for https://kubernetes.default/apis/apiextensions.k8s.io/v1?timeout=32s
2022-10-17T03:25:45.602787 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:45.612447 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:45.613777 [INFO] - Beginning proxy request for https://kubernetes.default/apis/kubeapps.com/v1alpha1?timeout=32s
2022-10-17T03:25:45.613831 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:45.623089 [INFO] - Requesting k8s API endpoint with cert

So that began at 43.708, with the "Beginning - Exchanging - Requesting" for each request taking only around 10ms, but no requests are completed until all the concurrent incoming requests have been processed. So at 45.718, a full 2s later, the requests are all returned by the proxy, until 45.902, so over a period of 200ms.

2022-10-17T03:25:45.718840 [INFO] - GET https://kubernetes.default/apis/image.toolkit.fluxcd.io/v1alpha1?timeout=32s 200 OK
2022-10-17T03:25:45.724800 [INFO] - GET https://kubernetes.default/api/v1?timeout=32s 200 OK
2022-10-17T03:25:45.734030 [INFO] - GET https://kubernetes.default/apis/image.toolkit.fluxcd.io/v1beta1?timeout=32s 200 OK
2022-10-17T03:25:45.804941 [INFO] - GET https://kubernetes.default/apis/apiextensions.k8s.io/v1?timeout=32s 200 OK
2022-10-17T03:25:45.810390 [INFO] - GET https://kubernetes.default/apis/networking.k8s.io/v1?timeout=32s 200 OK
2022-10-17T03:25:45.811114 [INFO] - GET https://kubernetes.default/apis/policy/v1beta1?timeout=32s 200 OK
2022-10-17T03:25:45.813519 [INFO] - GET https://kubernetes.default/apis/autoscaling/v2?timeout=32s 200 OK
2022-10-17T03:25:45.814883 [INFO] - GET https://kubernetes.default/apis/discovery.k8s.io/v1?timeout=32s 200 OK
2022-10-17T03:25:45.816179 [INFO] - GET https://kubernetes.default/apis/autoscaling/v1?timeout=32s 200 OK
2022-10-17T03:25:45.816962 [INFO] - GET https://kubernetes.default/apis/authentication.k8s.io/v1?timeout=32s 200 OK
2022-10-17T03:25:45.817745 [INFO] - GET https://kubernetes.default/apis/autoscaling/v2beta1?timeout=32s 200 OK
2022-10-17T03:25:45.818542 [INFO] - GET https://kubernetes.default/apis/autoscaling/v2beta2?timeout=32s 200 OK
2022-10-17T03:25:45.819341 [INFO] - GET https://kubernetes.default/apis/config.concierge.pinniped.dev/v1alpha1?timeout=32s 200 OK
2022-10-17T03:25:45.820121 [INFO] - GET https://kubernetes.default/apis/identity.concierge.pinniped.dev/v1alpha1?timeout=32s 200 OK
2022-10-17T03:25:45.820910 [INFO] - GET https://kubernetes.default/apis/flowcontrol.apiserver.k8s.io/v1beta1?timeout=32s 200 OK
2022-10-17T03:25:45.821708 [INFO] - GET https://kubernetes.default/apis/kubeapps.com/v1alpha1?timeout=32s 200 OK
2022-10-17T03:25:45.822515 [INFO] - GET https://kubernetes.default/apis/authentication.concierge.pinniped.dev/v1alpha1?timeout=32s 200 OK
2022-10-17T03:25:45.822631 [INFO] - GET https://kubernetes.default/apis/certificates.k8s.io/v1?timeout=32s 200 OK
2022-10-17T03:25:45.823474 [INFO] - GET https://kubernetes.default/apis/storage.k8s.io/v1beta1?timeout=32s 200 OK
2022-10-17T03:25:45.824299 [INFO] - GET https://kubernetes.default/apis/authorization.k8s.io/v1?timeout=32s 200 OK
2022-10-17T03:25:45.825144 [INFO] - GET https://kubernetes.default/apis/storage.k8s.io/v1?timeout=32s 200 OK
2022-10-17T03:25:45.825296 [INFO] - GET https://kubernetes.default/apis/helm.toolkit.fluxcd.io/v2beta1?timeout=32s 200 OK
2022-10-17T03:25:45.826143 [INFO] - GET https://kubernetes.default/apis/flowcontrol.apiserver.k8s.io/v1beta2?timeout=32s 200 OK
2022-10-17T03:25:45.826982 [INFO] - GET https://kubernetes.default/apis/kustomize.toolkit.fluxcd.io/v1beta2?timeout=32s 200 OK
2022-10-17T03:25:45.829095 [INFO] - GET https://kubernetes.default/apis/policy/v1?timeout=32s 200 OK
2022-10-17T03:25:45.829914 [INFO] - GET https://kubernetes.default/apis/source.toolkit.fluxcd.io/v1beta1?timeout=32s 200 OK
2022-10-17T03:25:45.830026 [INFO] - GET https://kubernetes.default/apis/discovery.k8s.io/v1beta1?timeout=32s 200 OK
2022-10-17T03:25:45.902639 [INFO] - GET https://kubernetes.default/apis/image.toolkit.fluxcd.io/v1alpha2?timeout=32s 200 OK
2022-10-17T03:25:45.902752 [INFO] - GET https://kubernetes.default/apis/apps/v1?timeout=32s 200 OK
2022-10-17T03:25:45.902949 [INFO] - GET https://kubernetes.default/apis/admissionregistration.k8s.io/v1?timeout=32s 200 OK
2022-10-17T03:25:45.904062 [INFO] - GET https://kubernetes.default/apis/rbac.authorization.k8s.io/v1?timeout=32s 200 OK
2022-10-17T03:25:45.904242 [INFO] - GET https://kubernetes.default/apis/source.toolkit.fluxcd.io/v1beta2?timeout=32s 200 OK
2022-10-17T03:25:45.904390 [INFO] - GET https://kubernetes.default/apis/coordination.k8s.io/v1?timeout=32s 200 OK
2022-10-17T03:25:45.905383 [INFO] - GET https://kubernetes.default/apis/events.k8s.io/v1beta1?timeout=32s 200 OK
2022-10-17T03:25:45.906250 [INFO] - GET https://kubernetes.default/apis/kustomize.toolkit.fluxcd.io/v1beta1?timeout=32s 200 OK
2022-10-17T03:25:45.907137 [INFO] - GET https://kubernetes.default/apis/scheduling.k8s.io/v1?timeout=32s 200 OK
2022-10-17T03:25:45.908026 [INFO] - GET https://kubernetes.default/apis/events.k8s.io/v1?timeout=32s 200 OK
2022-10-17T03:25:45.908946 [INFO] - GET https://kubernetes.default/apis/node.k8s.io/v1beta1?timeout=32s 200 OK
2022-10-17T03:25:45.909809 [INFO] - GET https://kubernetes.default/apis/apiregistration.k8s.io/v1?timeout=32s 200 OK
2022-10-17T03:25:45.910668 [INFO] - GET https://kubernetes.default/apis/login.concierge.pinniped.dev/v1alpha1?timeout=32s 200 OK
2022-10-17T03:25:45.911502 [INFO] - GET https://kubernetes.default/apis/node.k8s.io/v1?timeout=32s 200 OK
2022-10-17T03:25:45.916386 [INFO] - GET https://kubernetes.default/apis/batch/v1?timeout=32s 200 OK
2022-10-17T03:25:45.917290 [INFO] - GET https://kubernetes.default/apis/notification.toolkit.fluxcd.io/v1beta1?timeout=32s 200 OK
2022-10-17T03:25:45.917425 [INFO] - GET https://kubernetes.default/apis/batch/v1beta1?timeout=32s 200 OK
2022-10-17T03:25:45.919930 [INFO] - Beginning proxy request for https://kubernetes.default/apis/helm.toolkit.fluxcd.io/v2beta1/namespaces/kubeapps/helmreleases
2022-10-17T03:25:45.919980 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:46.004163 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:46.009001 [INFO] - GET https://kubernetes.default/apis/helm.toolkit.fluxcd.io/v2beta1/namespaces/kubeapps/helmreleases 200 OK

Read in-depth overview of the tokio scheduler. In particular the addition of the next_task slot sounds like a solution to this exact problem - but I'm not yet sure why we're not seeing the desired behaviour? (ie. when a proxied request is complete, the task should be marked runnable and put in the next_task slot so it can jump the queue and be returned, rather than continuing with the next incoming requests from the queue.

It'll probably be a simple configuration, but I'll continue tomorrow.

absoludity avatar Oct 17 '22 06:10 absoludity

Interesting findings!

When the network request is completed, it is marked as runnable again, and should be put into the front of the queue (the next-task slot), but seems to be put on the back of the queue.

Is that an actual bug in tokio? In the Java world, sometimes this kind of problems occur when the pool of threads is too small. But if no requests are completed until all the concurrent incoming requests have been processed, it does not seem like related to the pool size.

castelblanque avatar Oct 17 '22 06:10 castelblanque

Is that an actual bug in tokio?

I doubt it, more likely just a config option or something I've missed, but I'll hopefully find out tomorrow.

In the Java world, sometimes this kind of problems occur when the pool of threads is too small. But if no requests are completed until all the concurrent incoming requests have been processed, it does not seem like related to the pool size.

Yeah - all incoming requests get processed before it moves on to return the results for each, though it confuses me that they seem to be done serially. Interesting anyway :)

absoludity avatar Oct 17 '22 07:10 absoludity

So, summary of findings after creating a minimal working example:

  • There is no issue in tokio (as expected). It is correct that it processes I/O events in a FIFO manner
  • The issue is a combination of a CPU-intensive calculation in the request (I suspect calculating hashes to lookup the cached credential, but will verify) and CPU resource starvation for the service itself, so that actually initiating the requests takes many seconds.

Using my minimal working example, I can reproduce the results seen by explicitly:

  1. Adding a 100ms (or otherwise) CPU-blocking call right before each request is made, and
  2. Restricting the number of CPUs available to the runtime.

If only one of these is done, the queries can still execute quickly.

I'll profile the credential cache access used in pinniped-proxy next.

absoludity avatar Oct 19 '22 04:10 absoludity

...and found it: 2 things to note in the logs below:

  1. The request begins, but after my log message of "Exchanging id_token for..." there must be an unexpected I/O event that causes the task to go idle (even though it should be a cache request) - hence all requests get started before any request actually finishes the exchange and begins "Requesting k8s API endpoint with cert",
  2. Perhaps related, but the calls to call_pinniped are apparently stacking (perhaps because of the sync_writes option I'm using on the cache), starting at 15ms and just growing from there as time passes up to almost 2s per request.

Finally I can now actually try to fix the issue :)

2022-10-19T06:02:32.757Z INFO  pinniped_proxy::service > Beginning proxy request for https://kubernetes.default/apis/autoscaling/v2beta1?timeout=32s
 2022-10-19T06:02:32.757Z INFO  pinniped_proxy::service > Exchanging id_token for X509 client identity using pinniped
 2022-10-19T06:02:32.757Z INFO  pinniped_proxy::service > Beginning proxy request for https://kubernetes.default/apis/kustomize.toolkit.fluxcd.io/v1beta2?timeout=32s
 2022-10-19T06:02:32.757Z INFO  pinniped_proxy::service > Exchanging id_token for X509 client identity using pinniped
 2022-10-19T06:02:32.760Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 16ms, of which 15ms was call_pinniped
 2022-10-19T06:02:32.765Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:32.765Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 20ms, of which 20ms was call_pinniped
 2022-10-19T06:02:32.835Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:32.836Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 91ms, of which 91ms was call_pinniped
 2022-10-19T06:02:32.860Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:32.860Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 115ms, of which 115ms was call_pinniped
 2022-10-19T06:02:32.945Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:32.945Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 199ms, of which 199ms was call_pinniped
 2022-10-19T06:02:32.953Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:32.953Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 207ms, of which 207ms was call_pinniped
 2022-10-19T06:02:33.039Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:33.039Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 293ms, of which 293ms was call_pinniped
 2022-10-19T06:02:33.135Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:33.135Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 388ms, of which 388ms was call_pinniped
 2022-10-19T06:02:33.157Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:33.157Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 411ms, of which 411ms was call_pinniped
 2022-10-19T06:02:33.255Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:33.255Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 508ms, of which 508ms was call_pinniped
 2022-10-19T06:02:33.355Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:33.355Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 608ms, of which 607ms was call_pinniped
 2022-10-19T06:02:33.438Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:33.438Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 691ms, of which 691ms was call_pinniped
 2022-10-19T06:02:33.444Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:33.444Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 696ms, of which 696ms was call_pinniped
 2022-10-19T06:02:33.448Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:33.448Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 700ms, of which 700ms was call_pinniped
 2022-10-19T06:02:33.454Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:33.454Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 705ms, of which 705ms was call_pinniped
 2022-10-19T06:02:33.539Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:33.539Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 790ms, of which 790ms was call_pinniped
 2022-10-19T06:02:33.636Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:33.636Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 887ms, of which 887ms was call_pinniped
 2022-10-19T06:02:33.653Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:33.653Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 904ms, of which 903ms was call_pinniped
 2022-10-19T06:02:33.760Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:33.760Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1011ms, of which 1010ms was call_pinniped
 2022-10-19T06:02:33.837Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:33.838Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1088ms, of which 1088ms was call_pinniped
 2022-10-19T06:02:33.842Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:33.842Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1092ms, of which 1092ms was call_pinniped
 2022-10-19T06:02:33.847Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:33.847Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1097ms, of which 1097ms was call_pinniped
 2022-10-19T06:02:33.852Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:33.852Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1101ms, of which 1101ms was call_pinniped
 2022-10-19T06:02:33.947Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:33.948Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1197ms, of which 1197ms was call_pinniped
 2022-10-19T06:02:34.035Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:34.035Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1284ms, of which 1283ms was call_pinniped
 2022-10-19T06:02:34.133Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:34.134Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1382ms, of which 1382ms was call_pinniped
 2022-10-19T06:02:34.157Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:34.157Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1405ms, of which 1405ms was call_pinniped
 2022-10-19T06:02:34.238Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:34.238Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1487ms, of which 1486ms was call_pinniped
 2022-10-19T06:02:34.243Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:34.243Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1491ms, of which 1491ms was call_pinniped
 2022-10-19T06:02:34.248Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:34.248Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1495ms, of which 1495ms was call_pinniped
 2022-10-19T06:02:34.253Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:34.253Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1500ms, of which 1499ms was call_pinniped
 2022-10-19T06:02:34.335Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:34.336Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1582ms, of which 1582ms was call_pinniped
 2022-10-19T06:02:34.435Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:34.435Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1681ms, of which 1680ms was call_pinniped
 2022-10-19T06:02:34.452Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:34.452Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1697ms, of which 1697ms was call_pinniped
 2022-10-19T06:02:34.535Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:34.535Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1780ms, of which 1780ms was call_pinniped
 2022-10-19T06:02:34.542Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:34.542Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1787ms, of which 1787ms was call_pinniped
 2022-10-19T06:02:34.549Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:34.549Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1793ms, of which 1793ms was call_pinniped
 2022-10-19T06:02:34.634Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:34.634Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1878ms, of which 1878ms was call_pinniped
 2022-10-19T06:02:34.640Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:34.640Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1883ms, of which 1883ms was call_pinniped
 2022-10-19T06:02:34.645Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:34.645Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1888ms, of which 1888ms was call_pinniped
 2022-10-19T06:02:34.651Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:34.651Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1894ms, of which 1893ms was call_pinniped
 2022-10-19T06:02:34.657Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:34.657Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1899ms, of which 1899ms was call_pinniped
 2022-10-19T06:02:34.752Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:34.941Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/kustomize.toolkit.fluxcd.io/v1beta2?timeout=32s 200 OK
 2022-10-19T06:02:34.943Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/node.k8s.io/v1?timeout=32s 200 OK
 2022-10-19T06:02:34.946Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/config.concierge.pinniped.dev/v1alpha1?timeout=32s 200 OK
 2022-10-19T06:02:34.948Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/autoscaling/v2beta2?timeout=32s 200 OK
 2022-10-19T06:02:35.036Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/api/v1?timeout=32s 200 OK
 2022-10-19T06:02:35.037Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/login.concierge.pinniped.dev/v1alpha1?timeout=32s 200 OK
 2022-10-19T06:02:35.040Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/storage.k8s.io/v1beta1?timeout=32s 200 OK
 2022-10-19T06:02:35.043Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/autoscaling/v1?timeout=32s 200 OK
 2022-10-19T06:02:35.046Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/storage.k8s.io/v1?timeout=32s 200 OK
 2022-10-19T06:02:35.046Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/kubeapps.com/v1alpha1?timeout=32s 200 OK
 2022-10-19T06:02:35.049Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/image.toolkit.fluxcd.io/v1alpha2?timeout=32s 200 OK
 2022-10-19T06:02:35.050Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/autoscaling/v2?timeout=32s 200 OK
 2022-10-19T06:02:35.052Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/networking.k8s.io/v1?timeout=32s 200 OK
 2022-10-19T06:02:35.053Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/policy/v1?timeout=32s 200 OK
 2022-10-19T06:02:35.056Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/authorization.k8s.io/v1?timeout=32s 200 OK
 2022-10-19T06:02:35.135Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/authentication.concierge.pinniped.dev/v1alpha1?timeout=32s 200 OK
 2022-10-19T06:02:35.136Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/policy/v1beta1?timeout=32s 200 OK
 2022-10-19T06:02:35.139Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/apiextensions.k8s.io/v1?timeout=32s 200 OK
 2022-10-19T06:02:35.142Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/flowcontrol.apiserver.k8s.io/v1beta2?timeout=32s 200 OK
 2022-10-19T06:02:35.145Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/batch/v1beta1?timeout=32s 200 OK
 2022-10-19T06:02:35.148Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/autoscaling/v2beta1?timeout=32s 200 OK
 2022-10-19T06:02:35.152Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/events.k8s.io/v1?timeout=32s 200 OK
 2022-10-19T06:02:35.155Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/discovery.k8s.io/v1beta1?timeout=32s 200 OK
 2022-10-19T06:02:35.234Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/notification.toolkit.fluxcd.io/v1beta1?timeout=32s 200 OK
 2022-10-19T06:02:35.234Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/events.k8s.io/v1beta1?timeout=32s 200 OK
 2022-10-19T06:02:35.252Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/source.toolkit.fluxcd.io/v1beta1?timeout=32s 200 OK
 2022-10-19T06:02:35.252Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/source.toolkit.fluxcd.io/v1beta2?timeout=32s 200 OK
 2022-10-19T06:02:35.252Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/image.toolkit.fluxcd.io/v1beta1?timeout=32s 200 OK
 2022-10-19T06:02:35.253Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/admissionregistration.k8s.io/v1?timeout=32s 200 OK
 2022-10-19T06:02:35.257Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/authentication.k8s.io/v1?timeout=32s 200 OK
 2022-10-19T06:02:35.335Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/batch/v1?timeout=32s 200 OK
 2022-10-19T06:02:35.338Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/kustomize.toolkit.fluxcd.io/v1beta1?timeout=32s 200 OK
 2022-10-19T06:02:35.341Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/identity.concierge.pinniped.dev/v1alpha1?timeout=32s 200 OK
 2022-10-19T06:02:35.344Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/helm.toolkit.fluxcd.io/v2beta1?timeout=32s 200 OK
 2022-10-19T06:02:35.347Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/certificates.k8s.io/v1?timeout=32s 200 OK
 2022-10-19T06:02:35.350Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/rbac.authorization.k8s.io/v1?timeout=32s 200 OK
 2022-10-19T06:02:35.351Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/coordination.k8s.io/v1?timeout=32s 200 OK
 2022-10-19T06:02:35.354Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/discovery.k8s.io/v1?timeout=32s 200 OK
 2022-10-19T06:02:35.357Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/apiregistration.k8s.io/v1?timeout=32s 200 OK
 2022-10-19T06:02:35.434Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/node.k8s.io/v1beta1?timeout=32s 200 OK
 2022-10-19T06:02:35.435Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/flowcontrol.apiserver.k8s.io/v1beta1?timeout=32s 200 OK
 2022-10-19T06:02:35.436Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/image.toolkit.fluxcd.io/v1alpha1?timeout=32s 200 OK

absoludity avatar Oct 19 '22 06:10 absoludity

And the winner is the sync_writes option to the cached macro: I had misunderstood what it does, turns out it actually causes each call to try to acquire a write-lock before executing the function, so the first query gets the write-lock, then every other following query gets I/O blocked at this point while trying to get the credential until the previous one releases it.

Basically forming a queue to sequentially get the credential before each can make the proxied request.

Without sync_writes = true it works as expected.

absoludity avatar Oct 19 '22 06:10 absoludity

And the winner is the sync_writes option to the cached macro

Awesome work Michael. So the write-lock was blocking all subsequent requests.

This is something that still amazes me from programming: such a simple fix, but soo hard to pinpoint 😄

castelblanque avatar Oct 19 '22 06:10 castelblanque

Awesome finding Michael!

ppbaena avatar Oct 19 '22 08:10 ppbaena

Thanks - it was a lot of fun to track down (good chance to learn quite a bit). I just had to share here because I'm working all on my own in my TZ with no-one to high-five with :)

absoludity avatar Oct 19 '22 09:10 absoludity

Just an update on this: as mentioned on the linked PR, the issue that I found last week (mentioned above) fixes one aspect, but the requests still takes way too long. I've spent the past couple of days digging to find the issue and finding that

  1. the call to the actual cached function is sometimes a few ms, other times 100ms:
2022-10-23T22:31:22.999358 [INFO] - prep_and_call took 4ms, 4 of which was call_pinniped
2022-10-23T22:31:23.098464 [INFO] - prep_and_call took 94ms, 94 of which was call_pinniped
2022-10-23T22:31:23.292432 [INFO] - prep_and_call took 96ms, 96 of which was call_pinniped
2022-10-23T22:31:23.484841 [INFO] - prep_and_call took 94ms, 94 of which was call_pinniped
2022-10-23T22:31:23.600418 [INFO] - prep_and_call took 18ms, 18 of which was call_pinniped
2022-10-23T22:31:23.782121 [INFO] - prep_and_call took 93ms, 93 of which was call_pinniped
2022-10-23T22:31:23.792623 [INFO] - prep_and_call took 4ms, 4 of which was call_pinniped
  1. The actual cache lookup is less than a ms, consistently.
  2. The longer durations appear to be CPU starvation - but why?

I didn't have much success finding out how the above could be true until this morning, when realising that because the cached function call_pinniped is called asynchronously, regardless of whether the result is cached or not. So even though the value is cached after the first call, every request going through pinniped gets slept when .await is called (it's an i/o wait), with the task manager switching to a different task (rather than just pulling the value out of the cache) and so is dependent on the task being enabled again by the task manager.

At least, this is what I will next check, though I need to take a break from this to get some other things done (hence dumping thoughts here). The approach will be, rather than decorating the async function with the cached macro, instead, having a cache available and only calling the async function on a cache miss. This should avoid sleeping the task unnecessarily.

absoludity avatar Oct 24 '22 00:10 absoludity