cloud-pipeline icon indicating copy to clipboard operation
cloud-pipeline copied to clipboard

Restrict a count of the instances to launch

Open NShaforostov opened this issue 3 years ago • 18 comments

Background

In some cases, it may be useful for admins to have the ability to restrict the number of instances that any or certain user can launch simultaneously. It can be helpful against the overloading of the Cloud Pipeline deployment by a single user. We should implement such ability.

Approach

Restriction by the running instances count shall be implemented similar to "Allowed instance types" or "Allowed instance price types":

  • there shall be a new system preference, e.g. cluster.allowed.instance.max.count. That preference shall define the global restriction by the running instances count
  • in GUI, there shall be the ability to set the restriction of the running instances count for a user group (in the user group settings) - in this case, configured count shall be applied to each user of that group, e.g.: image
  • in GUI, there shall be the ability to set the restriction of the running instances count for a certain user (in the user settings) - in this case, configured count shall be applied to the specific user, e.g.: image
  • there shall be the hierarchy of priorities of the configured restrictions by number of instances (descending):
    • user restriction (via GUI)
    • user group restriction (via GUI)
    • global restriction (via the system preference)
  • if any described restriction is applied to the user, and that user tries to launch (to run at the same time) more instances that defined by the restriction:
    • if user tries to run a simple job or a job based on the static cluster - an error message shall appear. New jobs with new instances can not be scheduled till previously launched instances by this user will not be stopped/paused or terminated.
    • if user tries to run a job based on the autoscaled cluster and full (possible) size of that cluster is greater than instances restriction - a warning message shall appear - at the Launch pop-up and on the Autoscaled cluster config pop-up (text should be like "Note, that full size of the cluster you config is greater than maximal instances allowed you to run. So, there could be issues when running such cluster. Do it only at your own risk." In this case, autoscaled job shall scale up just to reaching the instances restrictions threshold configured for this user

    Note: this point shall be fully applied for pipe CLI as well

Other options

Add a new pipe CLI command that shall display:

  • count of instances running by the user at the moment
  • the configured restriction by the instances count (i.e. maximal count of the instances that current user can launch simultaneously

For that, I suggest a command like pipe users instances

NShaforostov avatar May 25 '22 20:05 NShaforostov

Contextual preference, that describes a user limit, has launch.max.runs.user name.

I believe, that system-level preference cluster.allowed.instance.max.count might be renamed to launch.max.runs.user.global, because as far as I understood, it should be considered as some default limit for every user registered in the platform.

Wedds avatar Jun 07 '22 11:06 Wedds

@Wedds It should be possible to not set a value for launch.max.runs.user.global system-level preference. Now error Preference 'launch.max.runs.user.global' contains invalid value '' is shown at saving preference without value.

maryvictol avatar Jun 16 '22 13:06 maryvictol

@Wedds Resume for paused runs should be restricted if the maximal number of instances allowed to run is exceeded for the current user.

maryvictol avatar Jun 16 '22 13:06 maryvictol

@Wedds pipe CLI command that shall display the count of instances running by the user at the moment not only the configured restriction by the instances count.

maryvictol avatar Jun 16 '22 13:06 maryvictol

GUI part implemented (#2656)

rodichenko avatar Jun 27 '22 11:06 rodichenko

@rodichenko Bug: Allowed instance max count value specified for Group should be kept in the launch.max.runs.group Contextual preference. Now it's kept in the launch.max.runs.user preference.

maryvictol avatar Jun 28 '22 10:06 maryvictol

@rodichenko Bug: Allowed instance max count value specified for Group should be kept in the launch.max.runs.group Contextual preference. Now it's kept in the launch.max.runs.user preference.

@maryvictol should be fixed with d0aec9e)

rodichenko avatar Jun 28 '22 13:06 rodichenko

@rodichenko Bug: A warning message doesn't appear at the Launch pop-up at case of user tries to launch more instances that defined by the restriction specified for User Group or by system preference launch.max.runs.user.global.

maryvictol avatar Jun 28 '22 16:06 maryvictol

@rodichenko Bug: A warning message doesn't appear at the Launch pop-up at case of user tries to launch more instances that defined by the restriction specified for User Group or by system preference launch.max.runs.user.global.

@maryvictol should be fixed with 374bfa9

rodichenko avatar Jun 28 '22 17:06 rodichenko

@Wedds Bug: If user has active job based on the static or autoscalled cluster pipe CLI command pipe users instances doesn't count child nodes at counting Active runs detected for a user.

maryvictol avatar Jun 30 '22 17:06 maryvictol

@rodichenko Bug: Child nodes of active jobs based on the static or autoscalled cluster, aren't included into total count of user's running jobs. As result warnings about exceeded maximum number of user's running jobs are shown incorrectly on the Launch form and pop-up.

maryvictol avatar Jun 30 '22 23:06 maryvictol

@rodichenko Bug: Child nodes of active jobs based on the static or autoscalled cluster, aren't included into total count of user's running jobs. As result warnings about exceeded maximum number of user's running jobs are shown incorrectly on the Launch form and pop-up.

@maryvictol should be fixed by 03013d36a31b79d8be1d3bee95bcc890ca96229e

rodichenko avatar Jul 01 '22 11:07 rodichenko

@rodichenko Bug: When user tries to run a job based on the Autoscaled cluster and full (possible) size of that cluster is greater than instances restriction a warning message isn't shown at the Launch pop-up.

maryvictol avatar Jul 04 '22 11:07 maryvictol

@rodichenko, @AleksandrGorodetskii Bug: At displaying the warning messages about exceeded maximum number of user's running jobs Paused runs shouldn't be included into total count of user's running jobs.

maryvictol avatar Jul 04 '22 17:07 maryvictol

@rodichenko, @AleksandrGorodetskii Bug: At displaying the warning messages about exceeded maximum number of user's running jobs Paused runs shouldn't be included into total count of user's running jobs.

@maryvictol #2718 corrects this misunderstanding

rodichenko avatar Jul 05 '22 05:07 rodichenko

@sidoruka backported to release/0.16 (a3ba5eba6a30f1243996a961886238e6bbd9aa8c)

rodichenko avatar Jul 07 '22 09:07 rodichenko

Test cases were created by https://github.com/epam/cloud-pipeline/pull/2725 and located here.

maryvictol avatar Jul 13 '22 20:07 maryvictol

Docs were added via #3228 and located here.

NShaforostov avatar Oct 17 '23 17:10 NShaforostov