milvus icon indicating copy to clipboard operation
milvus copied to clipboard

[Feature]: Adding rate limiting to the Flush API

Open jiaoew1991 opened this issue 1 year ago • 4 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Is your feature request related to a problem? Please describe.

The Flush API provides users with the ability to persist data in the stream, which can save time waiting for indexing. However, frequent calls to Flush by many users in order to persist data as quickly as possible, even after inserting a single piece of data, is very unfriendly to Milvus' storage mechanism. Every time Flush is called in Milvus' storage system, a new set of files is generated. Ideally, the system should batch persistence based on runtime conditions rather than manually calling Flush too frequently and generating massive amounts of small files that greatly affect subsequent processes such as Load and Compaction. This can lead to a series of stability and performance issues.

There's no doubt about the role of the Flush API. To prevent misuse by users, call restrictions need to be added such as only allowing one call per minute or making this parameter configurable.

Describe the solution you'd like.

No response

Describe an alternate solution.

No response

Anything else? (Additional Context)

No response

jiaoew1991 avatar May 24 '23 01:05 jiaoew1991

I'd like to work on this but I may need some guidance.

kevinmingtarja avatar May 25 '23 02:05 kevinmingtarja

I'd like to work on this but I may need some guidance.

First read the code Quota Center in rootcoord, it's the code where the quota limit setted.

Then read the ratelimiter code in proxy. it's the code where the quota really applied.

So generally what happens:

Rootcoord -> collect metrics from proxy, datanode, querynode Rootcoord -> according to the metrics setup the quota limit Proxy -> execute the quota limit based on the setted value

xiaofan-luan avatar May 25 '23 10:05 xiaofan-luan

Hi @kevinmingtarja are you still working on this issue? I am interested in this as well and maybe we can discuss more on this. I read the docs and found quotaAndLimits.flush.max controls the rate of Flush API. This value takes any input range between [0 - Inf) right now, as @jiaoew1991 suggested so basically we should make this value to be restricted to 1 when Flush API is enabled. Does it make sense?

bryanwux avatar Jun 02 '23 09:06 bryanwux

Thanks a lot @xiaofan-luan for the pointers.

@bryanwux yeah I saw that config as well, along with quotaAndLimits.flushRate.enabled. So, my understanding is that right now it's already configurable by setting quotaAndLimits.flushRate.enabled=true and quotaAndLimits.flush.max (although it's a per second limit, not per minute) to some amount.

So I wanted to clarify with @jiaoew1991, what kind of additional rate limiting do we want to implement? Considering that we already have these two configurable configs. Thanks!

kevinmingtarja avatar Jun 02 '23 17:06 kevinmingtarja

Thanks a lot @xiaofan-luan for the pointers.

@bryanwux yeah I saw that config as well, along with quotaAndLimits.flushRate.enabled. So, my understanding is that right now it's already configurable by setting quotaAndLimits.flushRate.enabled=true and quotaAndLimits.flush.max (although it's a per second limit, not per minute) to some amount.

So I wanted to clarify with @jiaoew1991, what kind of additional rate limiting do we want to implement? Considering that we already have these two configurable configs. Thanks!

Maybe we should think of interleaving the flush operation limit with file numbers and segment numbers? Have a fixed default limitation would be a great start, My proposal is for each collection flush can only happened once every 10 seconds, but client side could retry on the quota limitation error

xiaofan-luan avatar Jun 04 '23 02:06 xiaofan-luan

Thanks a lot @xiaofan-luan for the pointers. @bryanwux yeah I saw that config as well, along with quotaAndLimits.flushRate.enabled. So, my understanding is that right now it's already configurable by setting quotaAndLimits.flushRate.enabled=true and quotaAndLimits.flush.max (although it's a per second limit, not per minute) to some amount. So I wanted to clarify with @jiaoew1991, what kind of additional rate limiting do we want to implement? Considering that we already have these two configurable configs. Thanks!

Maybe we should think of interleaving the flush operation limit with file numbers and segment numbers? Have a fixed default limitation would be a great start, My proposal is for each collection flush can only happened once every 10 seconds, but client side could retry on the quota limitation error

fixed default value as the first step is ok

jiaoew1991 avatar Jun 05 '23 01:06 jiaoew1991

Good afternoon~~ @xiaofan-luan , I have a question. Does the rate limit of 10s mean that multiple agents will only have one flush request every 10 seconds, or a single agent will have one flush request every 10 seconds?

shunjiezhao avatar Jun 18 '23 08:06 shunjiezhao

the limitation is on the server side.

xiaofan-luan avatar Jun 18 '23 08:06 xiaofan-luan