incubator-uniffle icon indicating copy to clipboard operation
incubator-uniffle copied to clipboard

[Feature] User's resources quota

Open smallzhongfeng opened this issue 2 years ago • 13 comments

At present, we can not limit the user's resources. Maybe we can manually update the number of tasks submitted by the user through a configuration file. When the quota is exceeded, the app will be rejected, and the number of apps of different users can be used to represent resource quotas. What do u think? @jerqi

smallzhongfeng avatar Sep 13 '22 03:09 smallzhongfeng

Maybe this could be solved by implementing a custom AccessChecker to limit the users quota, I have done this.

zuston avatar Sep 13 '22 03:09 zuston

At present, the app level is also limited, right?

smallzhongfeng avatar Sep 13 '22 03:09 smallzhongfeng

Because I haven't seen similar PR for the time being, so I create this issue.

smallzhongfeng avatar Sep 13 '22 03:09 smallzhongfeng

At present, the app level is also limited, right?

Yes, I introduce a custom access checker to do following operation

  1. Do grey-scale.
  2. Add the blacklist for some jobs to fallback ESS.
  3. ...

And so I think the resource quotas limitation could be implemented in custom access checker.

Please let me know If I misunderstand u.

zuston avatar Sep 13 '22 03:09 zuston

Very coincidentally, our ideas are similar :) .

smallzhongfeng avatar Sep 13 '22 03:09 smallzhongfeng

Very coincidentally, our ideas are similar :) .

Maybe this is the best practice

zuston avatar Sep 13 '22 04:09 zuston

When will this feature be available? @zuston

smallzhongfeng avatar Sep 15 '22 08:09 smallzhongfeng

I think you misunderstand my thought. I implement the custom access checker to solve the problem you mentioned. You can do similar operations like me. And I think I wont submit this access checker to the uniffle codebase, because maybe it's not general.

zuston avatar Sep 15 '22 08:09 zuston

Well, although I think this may actually have some effect on user isolation, we can try to let users with high priority use more resources. I can understand what you mean.

smallzhongfeng avatar Sep 15 '22 08:09 smallzhongfeng

In fact, we have also achieved it, and completed the launch, the effect is still obvious, the user's resources are effectively managed, and it is easier to calculate the cost of the user's use for billing, so this issue is mentioned.

smallzhongfeng avatar Sep 15 '22 08:09 smallzhongfeng

User quota is ok for us. I think it's the part work of multi-tent user support.

jerqi avatar Sep 21 '22 11:09 jerqi

I can raise a pr if needed. @jerqi

smallzhongfeng avatar Sep 22 '22 03:09 smallzhongfeng

I can raise a pr if needed. @jerqi

If the pr is large, you could write a design document first.

jerqi avatar Sep 27 '22 07:09 jerqi

any update? we also have plan to do this. may i ask whats the scope of the quota limit ? is it on single shuffle server ? or for the whole shuffle size. am thinking maybe we can do it as server level quota, so this feature can work with multiple server feature, the shuffle write could write the rest blocks to another server.

Gustfh avatar Oct 12 '22 06:10 Gustfh

Currently I have no ideas on concrete design. If you want to contribute this feature, it’s better to have a simple design doc for reviewing. @Gustfh

Do u have some plan to invest this ticket? @smallzhongfeng

zuston avatar Oct 13 '22 13:10 zuston

In the versions used internally in our company, we use quotas to limit the number of apps that a single user can submit. I don't have much idea about the number of shuffle servers that a single user can use. But I will write a simple document this weekend to discuss whether there are other requirements that can be developed in the future.@jerqi @zuston @Gustfh

smallzhongfeng avatar Oct 13 '22 17:10 smallzhongfeng

Could you add some diagrams?

jerqi avatar Oct 17 '22 06:10 jerqi

OK, I will add later.

smallzhongfeng avatar Oct 17 '22 07:10 smallzhongfeng

Could you give us the authority of the comment?

jerqi avatar Oct 17 '22 09:10 jerqi

Could you give us the authority of the comment?

+1

zuston avatar Oct 17 '22 10:10 zuston

Sorry, I forgot to give you permission, it has been updated.@jerqi @zuston

smallzhongfeng avatar Oct 19 '22 04:10 smallzhongfeng

so it's user level quota, what if single app produce large shuffle data, then impact other app, for example a app have large shuffle data and also have lots of stage, and running for days, if you enable memory storage, this app's shuffle could live in memory for long times, am wonder should we have a quota for this situation.

Gustfh avatar Oct 19 '22 06:10 Gustfh

so it's user level quota, what if single app produce large shuffle data, then impact other app, for example a app have large shuffle data and also have lots of stage, and running for days, if you enable memory storage, this app's shuffle could live in memory for long times, am wonder should we have a quota for this situation.

+1. I think the quota of bytes used by app/hadoop-user also should be involved in the design. And I think the different quota limitation like app-number/storage-bytes could be enabled by user.

zuston avatar Oct 19 '22 10:10 zuston

@smallzhongfeng If you add some extra interfaces, you should describe them in the document.

jerqi avatar Oct 21 '22 08:10 jerqi

Could you add some diagrams?

I added a simple graphic to illustrate the process of Spark's resource limitation. A more complete pr will be proposed this week.

smallzhongfeng avatar Oct 24 '22 17:10 smallzhongfeng

so it's user level quota, what if single app produce large shuffle data, then impact other app, for example a app have large shuffle data and also have lots of stage, and running for days, if you enable memory storage, this app's shuffle could live in memory for long times, am wonder should we have a quota for this situation.

This is a good suggestion. I am currently developing it, which may be implemented in the next pr.

smallzhongfeng avatar Oct 24 '22 17:10 smallzhongfeng

@smallzhongfeng @Gustfh @zuston Do you want to discuss this issue through a meeting? I will start a meeting to discuss the issue #80, I want to discuss this issue, too. There are some other issues which we need to discuss, so I will send a email to our dev mail list, and select a proper date to start the meeting. You can tell me what time you are free by the email.

jerqi avatar Oct 25 '22 06:10 jerqi

Of course, I'm looking forward to it.

smallzhongfeng avatar Oct 26 '22 02:10 smallzhongfeng

@Gustfh @smallzhongfeng I have already send an email https://lists.apache.org/thread/2jlm3fswmsxy619ldyo4px700p3ybnvc. Do you have time at 11 am (UTC +8) Thursday this week?

jerqi avatar Oct 26 '22 03:10 jerqi