argo-events QPS / burst options for k8s go-client

Is your feature request related to a problem? Please describe. Our use-case looks like this: We get upload events from S3 sent to SQS, then the SQS EventSource fetches these messages and a Sensor triggers a Argo Workflow (with the k8s trigger) to kick off a conversion of these files. The conversion takes on average ~12s and we're going to have thousands of S3 events per second. Also we're going to start reconversions of maybe tens of thousands of files simutanouesly.

What we observed is, that the rate at which the ArgoWorkflow resources are created is very low (5/s). With 10,000 files/events it would take around 30 minutes until all workflow objects are created. That just slows down the whole pipeline.

The 5/s is the default QPS defined in the k8s go-client used in the k8s trigger.

Describe the solution you'd like I would like to be able to set the qps and burst options for the k8s client. I already built a sensor image with fixed qps and burst of 100 to verify that this is the bottleneck, it was. I'm just not sure how to implemented this. My first shot would be environment variables or arguments for the sensor.

Message from the maintainers:

If you wish to see this enhancement implemented please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.

Jan 12 '21 12:01 mrkwtz

I see, I think we probably can increase the default QPS and Burst, would you like to do it @mrkwtz ?

Jan 13 '21 00:01 whynowy

Sure, but I would also like to make it configurable. Because I guess the increased defaults won't help every use-case. Argo does this also in its workflow controller https://github.com/argoproj/argo/blob/45c792a59052db20da74713b29bdcd1145fc6748/cmd/workflow-controller/main.go#L67-L68

Jan 13 '21 08:01 mrkwtz

I'm okay with customization, couple of things:

The QPS are Burst we talked about are only for the the k8s go-client running in Sensor POD, so if we want to make them customizable, it could be done either through a Sensor CRD spec change (in the k8s trigger part), or by reading ENVs.
It looks like using ENV will be easier, since it does not require any spec change (ENVs already can be customized by setting spec.template.container.env), what changes we need is just the code change to pick up the ENVs and a doc to tell how to do it.

Jan 13 '21 09:01 whynowy

The config approaches above are both left to the users, if we want to do centralized config, then it customized values need to be written to the Sensor POD as something like ENVs during reconciliation, which I think is unnecessary.

BTW - do you have the data of the QPS that the api servers in the cluster can handle (without rate limit, or what is the rate limit setting)?

Jan 13 '21 09:01 whynowy

I'm okay with customization, couple of things:

The QPS are Burst we talked about are only for the the k8s go-client running in Sensor POD, so if we want to make them customizable, it could be done either through a Sensor CRD spec change (in the k8s trigger part), or by reading ENVs.

It looks like using ENV will be easier, since it does not require any spec change (ENVs already can be customized by setting spec.template.container.env), what changes we need is just the code change to pick up the ENVs and a doc to tell how to do it.

I guess we could also implement cobra for args handling in the sensor pod and use the spec.template.container.args setting to pass the config.

The config approaches above are both left to the users, if we want to do centralized config, then it customized values need to be written to the Sensor POD as something like ENVs during reconciliation, which I think is unnecessary.

Sorry I don't get that part. I'm not a native speaker :/ Maybe you could rephrase?

BTW - do you have the data of the QPS that the api servers in the cluster can handle (without rate limit, or what is the rate limit setting)?

Unfortunately not. We're running on EKS and I don't know if they use the defaults or made changes to these configs. As you can see the argo workflow controller uses these defaults: https://github.com/argoproj/argo/blob/45c792a59052db20da74713b29bdcd1145fc6748/cmd/workflow-controller/main.go#L118-L119 which are also the defaults in the kube-controller-manager.

Jan 13 '21 09:01 mrkwtz

I'm okay with customization, couple of things:

The QPS are Burst we talked about are only for the the k8s go-client running in Sensor POD, so if we want to make them customizable, it could be done either through a Sensor CRD spec change (in the k8s trigger part), or by reading ENVs.

It looks like using ENV will be easier, since it does not require any spec change (ENVs already can be customized by setting spec.template.container.env), what changes we need is just the code change to pick up the ENVs and a doc to tell how to do it.

I guess we could also implement cobra for args handling in the sensor pod and use the spec.template.container.args setting to pass the config.

Reading ENVs is easier and much more directly, using args kind of implicitly means the arg is globally effective, however it's only used by k8s trigger.

The config approaches above are both left to the users, if we want to do centralized config, then it customized values need to be written to the Sensor POD as something like ENVs during reconciliation, which I think is unnecessary.

Sorry I don't get that part. I'm not a native speaker :/ Maybe you could rephrase?

What I meant was, the configuration we discusses are done in the Sensor CRD object, not something like a centralized configmap.

BTW - do you have the data of the QPS that the api servers in the cluster can handle (without rate limit, or what is the rate limit setting)?

Unfortunately not. We're running on EKS and I don't know if they use the defaults or made changes to these configs. As you can see the argo workflow controller uses these defaults: https://github.com/argoproj/argo/blob/45c792a59052db20da74713b29bdcd1145fc6748/cmd/workflow-controller/main.go#L118-L119 which are also the defaults in the kube-controller-manager.

The thing we are discussing is making some rate-limit change on the k8s go-client side, not the api-server. And the purpose of QPS and Burst settings is to prevent spamming the api-server actually.

We can increase the numbers whatever we want, however if the api-server is not able to handle (or the api-server rate-limit is enabled with a number less than what we set), then all the changes we make is useless. This is why I asked this question, and this is also why I said we can increase the default settings from our side, since it's harmless to argo-events, because it's just a client to api-server, and api-server is supposed to have a safe guard in any case (or the user should know what he is doing, if it's expected to have a high volume of requests to api-server, and if the api-server is able to handle or if the api-server allows that concurrency).

Jan 13 '21 20:01 whynowy

I'm okay with customization, couple of things:

The QPS are Burst we talked about are only for the the k8s go-client running in Sensor POD, so if we want to make them customizable, it could be done either through a Sensor CRD spec change (in the k8s trigger part), or by reading ENVs.

It looks like using ENV will be easier, since it does not require any spec change (ENVs already can be customized by setting spec.template.container.env), what changes we need is just the code change to pick up the ENVs and a doc to tell how to do it.

I guess we could also implement cobra for args handling in the sensor pod and use the spec.template.container.args setting to pass the config.

Reading ENVs is easier and much more directly, using args kind of implicitly means the arg is globally effective, however it's only used by k8s trigger.

The kubeclient that is globally used in the sensor app is built in the main.go. So I thought the change would / could cover every interaction with the k8s api. For me it doesn't matter if its via envs or args but if we don't want to configure it globally we have to create an extra kubeclient only for the k8s trigger.

The config approaches above are both left to the users, if we want to do centralized config, then it customized values need to be written to the Sensor POD as something like ENVs during reconciliation, which I think is unnecessary.

Sorry I don't get that part. I'm not a native speaker :/ Maybe you could rephrase?

What I meant was, the configuration we discusses are done in the Sensor CRD object, not something like a centralized configmap.

Yeah of course, either via envs or args, both should be already possible with the current Sensor CR.

BTW - do you have the data of the QPS that the api servers in the cluster can handle (without rate limit, or what is the rate limit setting)?

Unfortunately not. We're running on EKS and I don't know if they use the defaults or made changes to these configs. As you can see the argo workflow controller uses these defaults: https://github.com/argoproj/argo/blob/45c792a59052db20da74713b29bdcd1145fc6748/cmd/workflow-controller/main.go#L118-L119 which are also the defaults in the kube-controller-manager.

The thing we are discussing is making some rate-limit change on the k8s go-client side, not the api-server. And the purpose of QPS and Burst settings is to prevent spamming the api-server actually.

We can increase the numbers whatever we want, however if the api-server is not able to handle (or the api-server rate-limit is enabled with a number less than what we set), then all the changes we make is useless. This is why I asked this question, and this is also why I said we can increase the default settings from our side, since it's harmless to argo-events, because it's just a client to api-server, and api-server is supposed to have a safe guard in any case (or the user should know what he is doing, if it's expected to have a high volume of requests to api-server, and if the api-server is able to handle or if the api-server allows that concurrency).

Like I said I would prefer setting the defaults to 20/30 (same as the server-side defaults) and make it configureable via envs or args. If someone choses to tweak this settings they should know what they're doing, but that goes for every configuration of every system.

Jan 14 '21 18:01 mrkwtz

I'm okay with customization, couple of things:

The QPS are Burst we talked about are only for the the k8s go-client running in Sensor POD, so if we want to make them customizable, it could be done either through a Sensor CRD spec change (in the k8s trigger part), or by reading ENVs.

It looks like using ENV will be easier, since it does not require any spec change (ENVs already can be customized by setting spec.template.container.env), what changes we need is just the code change to pick up the ENVs and a doc to tell how to do it.

I guess we could also implement cobra for args handling in the sensor pod and use the spec.template.container.args setting to pass the config.

Reading ENVs is easier and much more directly, using args kind of implicitly means the arg is globally effective, however it's only used by k8s trigger.

The kubeclient that is globally used in the sensor app is built in the main.go. So I thought the change would / could cover every interaction with the k8s api. For me it doesn't matter if its via envs or args but if we don't want to configure it globally we have to create an extra kubeclient only for the k8s trigger.

As far as I can recall, k8s go-client is only used by k8s trigger in Sensor.

The config approaches above are both left to the users, if we want to do centralized config, then it customized values need to be written to the Sensor POD as something like ENVs during reconciliation, which I think is unnecessary.

Sorry I don't get that part. I'm not a native speaker :/ Maybe you could rephrase?

What I meant was, the configuration we discusses are done in the Sensor CRD object, not something like a centralized configmap.

Yeah of course, either via envs or args, both should be already possible with the current Sensor CR.

BTW - do you have the data of the QPS that the api servers in the cluster can handle (without rate limit, or what is the rate limit setting)?

Unfortunately not. We're running on EKS and I don't know if they use the defaults or made changes to these configs. As you can see the argo workflow controller uses these defaults: https://github.com/argoproj/argo/blob/45c792a59052db20da74713b29bdcd1145fc6748/cmd/workflow-controller/main.go#L118-L119 which are also the defaults in the kube-controller-manager.

The thing we are discussing is making some rate-limit change on the k8s go-client side, not the api-server. And the purpose of QPS and Burst settings is to prevent spamming the api-server actually. We can increase the numbers whatever we want, however if the api-server is not able to handle (or the api-server rate-limit is enabled with a number less than what we set), then all the changes we make is useless. This is why I asked this question, and this is also why I said we can increase the default settings from our side, since it's harmless to argo-events, because it's just a client to api-server, and api-server is supposed to have a safe guard in any case (or the user should know what he is doing, if it's expected to have a high volume of requests to api-server, and if the api-server is able to handle or if the api-server allows that concurrency).

Like I said I would prefer setting the defaults to 20/30 (same as the server-side defaults) and make it configureable via envs or args. If someone choses to tweak this settings they should know what they're doing, but that goes for every configuration of every system.

Jan 14 '21 18:01 whynowy

Also in the ArgoWorkflow trigger

Jan 15 '21 08:01 mrkwtz

Also in the ArgoWorkflow trigger

No, ArgoWorkflow uses argo cli.

Jan 15 '21 19:01 whynowy

Ok it uses the k8s client but not for creating the workflow. https://github.com/argoproj/argo-events/blob/master/sensors/triggers/argo-workflow/argo-workflow.go#L169-L178

Jan 18 '21 10:01 mrkwtz

Try with https://github.com/argoproj/argo-events/blob/master/docs/sensors/more-about-sensors-and-triggers.md#trigger-rate-limit

Nov 30 '21 16:11 whynowy

We are also looking for the option to override the default QPS and burst values used inside the sensor pod while creating the k8s resources. We have tried using rateLimit configuration but it does not seem to increase the QPS. The defaults as per the k8s client are qps=5 and burst=10. we are seeing is 5 only which seems to be the default as per go k8s client. is there a way to fix this ?

Jan 23 '23 11:01 mothukur

We have the same issue. 1000 requests take 23 seconds to reach Eventbus and 3 minutes 18 seconds to become a workflow. Can we reopen this request? Sensor is extremely slow with these default settings.

Apr 20 '23 16:04 reenimathew

I am curious if somebody could actually generate thousands of workflows / steps in a very short time. There are limits discussed here, at both the sensor and the workflow controller, but also on the k8s api server, no matter it is EKS or GKE. GKE limits has been raised to 50/100 on the latest version (1.27), improving a bit things.

Still doing some tests with just a few hundreds workflows at once, i observe the same behavior: workflows are created pretty fast, but it takes ages for step pods to be created and workflow being considered as running. The fact that it takes that much time is also observable during a workflow execution, from a step to another, depending how many steps need to be created at the same time on the K8s cluster.

It seems to be more a Kubernetes api server restriction in the end. Could anyone have some decent running ? Do we have setup example showcasing the thousands of workflows sold by the documentation ?

Jun 22 '23 01:06 nicolas-vivot

argo-events argo-events copied to clipboard

QPS / burst options for k8s go-client

argo-events
argo-events copied to clipboard