cloud-pipeline
cloud-pipeline copied to clipboard
Restrict a count of the instances to launch
Background
In some cases, it may be useful for admins to have the ability to restrict the number of instances that any or certain user can launch simultaneously. It can be helpful against the overloading of the Cloud Pipeline deployment by a single user. We should implement such ability.
Approach
Restriction by the running instances count shall be implemented similar to "Allowed instance types" or "Allowed instance price types":
- there shall be a new system preference, e.g.
cluster.allowed.instance.max.count. That preference shall define the global restriction by the running instances count - in GUI, there shall be the ability to set the restriction of the running instances count for a user group (in the user group settings) - in this case, configured count shall be applied to each user of that group, e.g.:

- in GUI, there shall be the ability to set the restriction of the running instances count for a certain user (in the user settings) - in this case, configured count shall be applied to the specific user, e.g.:

- there shall be the hierarchy of priorities of the configured restrictions by number of instances (descending):
- user restriction (via GUI)
- user group restriction (via GUI)
- global restriction (via the system preference)
- if any described restriction is applied to the user, and that user tries to launch (to run at the same time) more instances that defined by the restriction:
- if user tries to run a simple job or a job based on the static cluster - an error message shall appear. New jobs with new instances can not be scheduled till previously launched instances by this user will not be stopped/paused or terminated.
- if user tries to run a job based on the autoscaled cluster and full (possible) size of that cluster is greater than instances restriction - a warning message shall appear - at the Launch pop-up and on the Autoscaled cluster config pop-up (text should be like "Note, that full size of the cluster you config is greater than maximal instances allowed you to run. So, there could be issues when running such cluster. Do it only at your own risk." In this case, autoscaled job shall scale up just to reaching the instances restrictions threshold configured for this user
Note: this point shall be fully applied for
pipeCLI as well
Other options
Add a new pipe CLI command that shall display:
- count of instances running by the user at the moment
- the configured restriction by the instances count (i.e. maximal count of the instances that current user can launch simultaneously
For that, I suggest a command like pipe users instances
Contextual preference, that describes a user limit, has launch.max.runs.user name.
I believe, that system-level preference cluster.allowed.instance.max.count might be renamed to launch.max.runs.user.global, because as far as I understood, it should be considered as some default limit for every user registered in the platform.
@Wedds
It should be possible to not set a value for launch.max.runs.user.global system-level preference. Now error Preference 'launch.max.runs.user.global' contains invalid value '' is shown at saving preference without value.
@Wedds Resume for paused runs should be restricted if the maximal number of instances allowed to run is exceeded for the current user.
@Wedds
pipe CLI command that shall display the count of instances running by the user at the moment not only the configured restriction by the instances count.
GUI part implemented (#2656)
@rodichenko
Bug:
Allowed instance max count value specified for Group should be kept in the launch.max.runs.group Contextual preference. Now it's kept in the launch.max.runs.user preference.
@rodichenko Bug: Allowed instance max count value specified for Group should be kept in the
launch.max.runs.groupContextual preference. Now it's kept in thelaunch.max.runs.userpreference.
@maryvictol should be fixed with d0aec9e)
@rodichenko
Bug:
A warning message doesn't appear at the Launch pop-up at case of user tries to launch more instances that defined by the restriction specified for User Group or by system preference launch.max.runs.user.global.
@rodichenko Bug: A warning message doesn't appear at the Launch pop-up at case of user tries to launch more instances that defined by the restriction specified for User Group or by system preference
launch.max.runs.user.global.
@maryvictol should be fixed with 374bfa9
@Wedds
Bug:
If user has active job based on the static or autoscalled cluster pipe CLI command pipe users instances doesn't count child nodes at counting Active runs detected for a user.
@rodichenko Bug: Child nodes of active jobs based on the static or autoscalled cluster, aren't included into total count of user's running jobs. As result warnings about exceeded maximum number of user's running jobs are shown incorrectly on the Launch form and pop-up.
@rodichenko Bug: Child nodes of active jobs based on the static or autoscalled cluster, aren't included into total count of user's running jobs. As result warnings about exceeded maximum number of user's running jobs are shown incorrectly on the Launch form and pop-up.
@maryvictol should be fixed by 03013d36a31b79d8be1d3bee95bcc890ca96229e
@rodichenko Bug: When user tries to run a job based on the Autoscaled cluster and full (possible) size of that cluster is greater than instances restriction a warning message isn't shown at the Launch pop-up.
@rodichenko, @AleksandrGorodetskii Bug: At displaying the warning messages about exceeded maximum number of user's running jobs Paused runs shouldn't be included into total count of user's running jobs.
@rodichenko, @AleksandrGorodetskii Bug: At displaying the warning messages about exceeded maximum number of user's running jobs Paused runs shouldn't be included into total count of user's running jobs.
@maryvictol #2718 corrects this misunderstanding
@sidoruka backported to release/0.16 (a3ba5eba6a30f1243996a961886238e6bbd9aa8c)
Test cases were created by https://github.com/epam/cloud-pipeline/pull/2725 and located here.
Docs were added via #3228 and located here.