incubator-uniffle
incubator-uniffle copied to clipboard
[Feature] User's resources quota
At present, we can not limit the user's resources. Maybe we can manually update the number of tasks submitted by the user through a configuration file. When the quota is exceeded, the app will be rejected, and the number of apps of different users can be used to represent resource quotas. What do u think? @jerqi
Maybe this could be solved by implementing a custom AccessChecker to limit the users quota, I have done this.
At present, the app level is also limited, right?
Because I haven't seen similar PR for the time being, so I create this issue.
At present, the app level is also limited, right?
Yes, I introduce a custom access checker to do following operation
- Do grey-scale.
- Add the blacklist for some jobs to fallback ESS.
- ...
And so I think the resource quotas limitation could be implemented in custom access checker.
Please let me know If I misunderstand u.
Very coincidentally, our ideas are similar :) .
Very coincidentally, our ideas are similar :) .
Maybe this is the best practice
When will this feature be available? @zuston
I think you misunderstand my thought. I implement the custom access checker to solve the problem you mentioned. You can do similar operations like me. And I think I wont submit this access checker to the uniffle codebase, because maybe it's not general.
Well, although I think this may actually have some effect on user isolation, we can try to let users with high priority use more resources. I can understand what you mean.
In fact, we have also achieved it, and completed the launch, the effect is still obvious, the user's resources are effectively managed, and it is easier to calculate the cost of the user's use for billing, so this issue is mentioned.
User quota is ok for us. I think it's the part work of multi-tent user support.
I can raise a pr if needed. @jerqi
I can raise a pr if needed. @jerqi
If the pr is large, you could write a design document first.
any update? we also have plan to do this. may i ask whats the scope of the quota limit ? is it on single shuffle server ? or for the whole shuffle size. am thinking maybe we can do it as server level quota, so this feature can work with multiple server feature, the shuffle write could write the rest blocks to another server.
Currently I have no ideas on concrete design. If you want to contribute this feature, it’s better to have a simple design doc for reviewing. @Gustfh
Do u have some plan to invest this ticket? @smallzhongfeng
In the versions used internally in our company, we use quotas to limit the number of apps that a single user can submit. I don't have much idea about the number of shuffle servers that a single user can use. But I will write a simple document this weekend to discuss whether there are other requirements that can be developed in the future.@jerqi @zuston @Gustfh
https://docs.google.com/document/d/1MApSMFQgoS1VAoKbZjomqSRm0iTbSuKG1yvKNlWW65c/edit?usp=sharing If u have time, PTAL @jerqi @zuston
Could you add some diagrams?
OK, I will add later.
Could you give us the authority of the comment
?
Could you give us the authority of the
comment
?
+1
Sorry, I forgot to give you permission, it has been updated.@jerqi @zuston
so it's user level quota, what if single app produce large shuffle data, then impact other app, for example a app have large shuffle data and also have lots of stage, and running for days, if you enable memory storage, this app's shuffle could live in memory for long times, am wonder should we have a quota for this situation.
so it's user level quota, what if single app produce large shuffle data, then impact other app, for example a app have large shuffle data and also have lots of stage, and running for days, if you enable memory storage, this app's shuffle could live in memory for long times, am wonder should we have a quota for this situation.
+1. I think the quota of bytes used by app/hadoop-user also should be involved in the design. And I think the different quota limitation like app-number/storage-bytes could be enabled by user.
@smallzhongfeng If you add some extra interfaces, you should describe them in the document.
Could you add some diagrams?
I added a simple graphic to illustrate the process of Spark's resource limitation. A more complete pr will be proposed this week.
so it's user level quota, what if single app produce large shuffle data, then impact other app, for example a app have large shuffle data and also have lots of stage, and running for days, if you enable memory storage, this app's shuffle could live in memory for long times, am wonder should we have a quota for this situation.
This is a good suggestion. I am currently developing it, which may be implemented in the next pr.
@smallzhongfeng @Gustfh @zuston Do you want to discuss this issue through a meeting? I will start a meeting to discuss the issue #80, I want to discuss this issue, too. There are some other issues which we need to discuss, so I will send a email to our dev mail list, and select a proper date to start the meeting. You can tell me what time you are free by the email.
Of course, I'm looking forward to it.
@Gustfh @smallzhongfeng I have already send an email https://lists.apache.org/thread/2jlm3fswmsxy619ldyo4px700p3ybnvc. Do you have time at 11 am (UTC +8) Thursday this week?