argo-events
argo-events copied to clipboard
QPS / burst options for k8s go-client
Is your feature request related to a problem? Please describe. Our use-case looks like this: We get upload events from S3 sent to SQS, then the SQS EventSource fetches these messages and a Sensor triggers a Argo Workflow (with the k8s trigger) to kick off a conversion of these files. The conversion takes on average ~12s and we're going to have thousands of S3 events per second. Also we're going to start reconversions of maybe tens of thousands of files simutanouesly.
What we observed is, that the rate at which the ArgoWorkflow resources are created is very low (5/s). With 10,000 files/events it would take around 30 minutes until all workflow objects are created. That just slows down the whole pipeline.
The 5/s is the default QPS defined in the k8s go-client used in the k8s trigger.
Describe the solution you'd like I would like to be able to set the qps and burst options for the k8s client. I already built a sensor image with fixed qps and burst of 100 to verify that this is the bottleneck, it was. I'm just not sure how to implemented this. My first shot would be environment variables or arguments for the sensor.
Message from the maintainers:
If you wish to see this enhancement implemented please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.
I see, I think we probably can increase the default QPS and Burst, would you like to do it @mrkwtz ?
Sure, but I would also like to make it configurable. Because I guess the increased defaults won't help every use-case. Argo does this also in its workflow controller https://github.com/argoproj/argo/blob/45c792a59052db20da74713b29bdcd1145fc6748/cmd/workflow-controller/main.go#L67-L68
I'm okay with customization, couple of things:
-
The QPS are Burst we talked about are only for the the k8s go-client running in Sensor POD, so if we want to make them customizable, it could be done either through a Sensor CRD spec change (in the
k8s
trigger part), or by reading ENVs. -
It looks like using ENV will be easier, since it does not require any spec change (ENVs already can be customized by setting
spec.template.container.env
), what changes we need is just the code change to pick up the ENVs and a doc to tell how to do it.
The config approaches above are both left to the users, if we want to do centralized config, then it customized values need to be written to the Sensor POD as something like ENVs during reconciliation, which I think is unnecessary.
BTW - do you have the data of the QPS that the api servers in the cluster can handle (without rate limit, or what is the rate limit setting)?
I'm okay with customization, couple of things:
- The QPS are Burst we talked about are only for the the k8s go-client running in Sensor POD, so if we want to make them customizable, it could be done either through a Sensor CRD spec change (in the
k8s
trigger part), or by reading ENVs.- It looks like using ENV will be easier, since it does not require any spec change (ENVs already can be customized by setting
spec.template.container.env
), what changes we need is just the code change to pick up the ENVs and a doc to tell how to do it.
I guess we could also implement cobra for args handling in the sensor pod and use the spec.template.container.args setting to pass the config.
The config approaches above are both left to the users, if we want to do centralized config, then it customized values need to be written to the Sensor POD as something like ENVs during reconciliation, which I think is unnecessary.
Sorry I don't get that part. I'm not a native speaker :/ Maybe you could rephrase?
BTW - do you have the data of the QPS that the api servers in the cluster can handle (without rate limit, or what is the rate limit setting)?
Unfortunately not. We're running on EKS and I don't know if they use the defaults or made changes to these configs. As you can see the argo workflow controller uses these defaults: https://github.com/argoproj/argo/blob/45c792a59052db20da74713b29bdcd1145fc6748/cmd/workflow-controller/main.go#L118-L119 which are also the defaults in the kube-controller-manager.
I'm okay with customization, couple of things:
- The QPS are Burst we talked about are only for the the k8s go-client running in Sensor POD, so if we want to make them customizable, it could be done either through a Sensor CRD spec change (in the
k8s
trigger part), or by reading ENVs.- It looks like using ENV will be easier, since it does not require any spec change (ENVs already can be customized by setting
spec.template.container.env
), what changes we need is just the code change to pick up the ENVs and a doc to tell how to do it.I guess we could also implement cobra for args handling in the sensor pod and use the spec.template.container.args setting to pass the config.
Reading ENVs is easier and much more directly, using args kind of implicitly means the arg is globally effective, however it's only used by k8s
trigger.
The config approaches above are both left to the users, if we want to do centralized config, then it customized values need to be written to the Sensor POD as something like ENVs during reconciliation, which I think is unnecessary.
Sorry I don't get that part. I'm not a native speaker :/ Maybe you could rephrase?
What I meant was, the configuration we discusses are done in the Sensor CRD object, not something like a centralized configmap.
BTW - do you have the data of the QPS that the api servers in the cluster can handle (without rate limit, or what is the rate limit setting)?
Unfortunately not. We're running on EKS and I don't know if they use the defaults or made changes to these configs. As you can see the argo workflow controller uses these defaults: https://github.com/argoproj/argo/blob/45c792a59052db20da74713b29bdcd1145fc6748/cmd/workflow-controller/main.go#L118-L119 which are also the defaults in the kube-controller-manager.
The thing we are discussing is making some rate-limit change on the k8s go-client side, not the api-server. And the purpose of QPS and Burst settings is to prevent spamming the api-server actually.
We can increase the numbers whatever we want, however if the api-server is not able to handle (or the api-server rate-limit is enabled with a number less than what we set), then all the changes we make is useless. This is why I asked this question, and this is also why I said we can increase the default settings from our side, since it's harmless to argo-events, because it's just a client to api-server, and api-server is supposed to have a safe guard in any case (or the user should know what he is doing, if it's expected to have a high volume of requests to api-server, and if the api-server is able to handle or if the api-server allows that concurrency).
I'm okay with customization, couple of things:
- The QPS are Burst we talked about are only for the the k8s go-client running in Sensor POD, so if we want to make them customizable, it could be done either through a Sensor CRD spec change (in the
k8s
trigger part), or by reading ENVs.- It looks like using ENV will be easier, since it does not require any spec change (ENVs already can be customized by setting
spec.template.container.env
), what changes we need is just the code change to pick up the ENVs and a doc to tell how to do it.I guess we could also implement cobra for args handling in the sensor pod and use the spec.template.container.args setting to pass the config.
Reading ENVs is easier and much more directly, using args kind of implicitly means the arg is globally effective, however it's only used by
k8s
trigger.
The kubeclient that is globally used in the sensor app is built in the main.go. So I thought the change would / could cover every interaction with the k8s api. For me it doesn't matter if its via envs or args but if we don't want to configure it globally we have to create an extra kubeclient only for the k8s trigger.
The config approaches above are both left to the users, if we want to do centralized config, then it customized values need to be written to the Sensor POD as something like ENVs during reconciliation, which I think is unnecessary.
Sorry I don't get that part. I'm not a native speaker :/ Maybe you could rephrase?
What I meant was, the configuration we discusses are done in the Sensor CRD object, not something like a centralized configmap.
Yeah of course, either via envs or args, both should be already possible with the current Sensor CR.
BTW - do you have the data of the QPS that the api servers in the cluster can handle (without rate limit, or what is the rate limit setting)?
Unfortunately not. We're running on EKS and I don't know if they use the defaults or made changes to these configs. As you can see the argo workflow controller uses these defaults: https://github.com/argoproj/argo/blob/45c792a59052db20da74713b29bdcd1145fc6748/cmd/workflow-controller/main.go#L118-L119 which are also the defaults in the kube-controller-manager.
The thing we are discussing is making some rate-limit change on the k8s go-client side, not the api-server. And the purpose of QPS and Burst settings is to prevent spamming the api-server actually.
We can increase the numbers whatever we want, however if the api-server is not able to handle (or the api-server rate-limit is enabled with a number less than what we set), then all the changes we make is useless. This is why I asked this question, and this is also why I said we can increase the default settings from our side, since it's harmless to argo-events, because it's just a client to api-server, and api-server is supposed to have a safe guard in any case (or the user should know what he is doing, if it's expected to have a high volume of requests to api-server, and if the api-server is able to handle or if the api-server allows that concurrency).
Like I said I would prefer setting the defaults to 20/30 (same as the server-side defaults) and make it configureable via envs or args. If someone choses to tweak this settings they should know what they're doing, but that goes for every configuration of every system.
I'm okay with customization, couple of things:
- The QPS are Burst we talked about are only for the the k8s go-client running in Sensor POD, so if we want to make them customizable, it could be done either through a Sensor CRD spec change (in the
k8s
trigger part), or by reading ENVs.- It looks like using ENV will be easier, since it does not require any spec change (ENVs already can be customized by setting
spec.template.container.env
), what changes we need is just the code change to pick up the ENVs and a doc to tell how to do it.I guess we could also implement cobra for args handling in the sensor pod and use the spec.template.container.args setting to pass the config.
Reading ENVs is easier and much more directly, using args kind of implicitly means the arg is globally effective, however it's only used by
k8s
trigger.The kubeclient that is globally used in the sensor app is built in the main.go. So I thought the change would / could cover every interaction with the k8s api. For me it doesn't matter if its via envs or args but if we don't want to configure it globally we have to create an extra kubeclient only for the k8s trigger.
As far as I can recall, k8s go-client is only used by k8s
trigger in Sensor.
The config approaches above are both left to the users, if we want to do centralized config, then it customized values need to be written to the Sensor POD as something like ENVs during reconciliation, which I think is unnecessary.
Sorry I don't get that part. I'm not a native speaker :/ Maybe you could rephrase?
What I meant was, the configuration we discusses are done in the Sensor CRD object, not something like a centralized configmap.
Yeah of course, either via envs or args, both should be already possible with the current Sensor CR.
BTW - do you have the data of the QPS that the api servers in the cluster can handle (without rate limit, or what is the rate limit setting)?
Unfortunately not. We're running on EKS and I don't know if they use the defaults or made changes to these configs. As you can see the argo workflow controller uses these defaults: https://github.com/argoproj/argo/blob/45c792a59052db20da74713b29bdcd1145fc6748/cmd/workflow-controller/main.go#L118-L119 which are also the defaults in the kube-controller-manager.
The thing we are discussing is making some rate-limit change on the k8s go-client side, not the api-server. And the purpose of QPS and Burst settings is to prevent spamming the api-server actually. We can increase the numbers whatever we want, however if the api-server is not able to handle (or the api-server rate-limit is enabled with a number less than what we set), then all the changes we make is useless. This is why I asked this question, and this is also why I said we can increase the default settings from our side, since it's harmless to argo-events, because it's just a client to api-server, and api-server is supposed to have a safe guard in any case (or the user should know what he is doing, if it's expected to have a high volume of requests to api-server, and if the api-server is able to handle or if the api-server allows that concurrency).
Like I said I would prefer setting the defaults to 20/30 (same as the server-side defaults) and make it configureable via envs or args. If someone choses to tweak this settings they should know what they're doing, but that goes for every configuration of every system.
Also in the ArgoWorkflow trigger
Also in the ArgoWorkflow trigger
No, ArgoWorkflow
uses argo
cli.
Ok it uses the k8s client but not for creating the workflow. https://github.com/argoproj/argo-events/blob/master/sensors/triggers/argo-workflow/argo-workflow.go#L169-L178
Try with https://github.com/argoproj/argo-events/blob/master/docs/sensors/more-about-sensors-and-triggers.md#trigger-rate-limit
We are also looking for the option to override the default QPS and burst values used inside the sensor pod while creating the k8s resources. We have tried using rateLimit configuration but it does not seem to increase the QPS. The defaults as per the k8s client are qps=5 and burst=10. we are seeing is 5 only which seems to be the default as per go k8s client. is there a way to fix this ?
We have the same issue. 1000 requests take 23 seconds to reach Eventbus and 3 minutes 18 seconds to become a workflow. Can we reopen this request? Sensor is extremely slow with these default settings.
I am curious if somebody could actually generate thousands of workflows / steps in a very short time. There are limits discussed here, at both the sensor and the workflow controller, but also on the k8s api server, no matter it is EKS or GKE. GKE limits has been raised to 50/100 on the latest version (1.27), improving a bit things.
Still doing some tests with just a few hundreds workflows at once, i observe the same behavior: workflows are created pretty fast, but it takes ages for step pods to be created and workflow being considered as running. The fact that it takes that much time is also observable during a workflow execution, from a step to another, depending how many steps need to be created at the same time on the K8s cluster.
It seems to be more a Kubernetes api server restriction in the end. Could anyone have some decent running ? Do we have setup example showcasing the thousands of workflows sold by the documentation ?