terraform-oci-oke icon indicating copy to clipboard operation
terraform-oci-oke copied to clipboard

Add a separate unmanaged node pool to run cluster autoscaler

Open hyder opened this issue 3 years ago • 8 comments

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Description

OKE now supports cluster autoscaler: https://docs.oracle.com/en-us/iaas/Content/ContEng/Tasks/contengusingclusterautoscaler.htm. We need to add support for it like we have done for horizontal and vertical pod autoscalers.

New or Affected Resource(s)

Potential Terraform Configuration

# Copy-paste any Terraform configurations for how the requested feature may be used. 

References

hyder avatar Apr 07 '21 23:04 hyder

Its very critical to add the support for nodepool auto-scalar to increase the adoption for enterprise customers , I would appreciate timelines please ?

shapeofarchitect avatar Sep 20 '21 22:09 shapeofarchitect

Hi @shapeofarchitect,

Thanks for reaching out. We have some ideas but we haven't fully tested them yet. We will probably look into this after the 4.0 release.

hyder avatar Sep 20 '21 23:09 hyder

Any updates on this?

mschmidt291 avatar Mar 25 '22 16:03 mschmidt291

Hi @hyder, any updates on this? I'm currently using this module with the cluster autoscaler installed, the autoscaler changes generate differences in the terraform plan, is there any suggested workaround?

aibarbetta avatar Sep 09 '22 19:09 aibarbetta

Hi @aibarbetta

I think we'll be able to start implementing this once #562 is complete. Would you be interested in testing?

hyder avatar Sep 09 '22 19:09 hyder

Hi @aibarbetta

I think we'll be able to start implementing this once #562 is complete. Would you be interested in testing?

@hyder sure, I can help test the changes in my environment, mention me in the PR when it's ready to test

aibarbetta avatar Sep 09 '22 21:09 aibarbetta

After further experiments, we'll add support for cluster autoscaler gradually:

  1. A separate unmanaged node pool to run cluster autoscaler (this issue)
  2. IAM resources (tags, dynamic group, policies) for the worker nodes that will be running cluster autoscaler (#578)
  3. Deploying cluster autoscaler itself (#579)

The unmanaged node pool needs to also be upgradable using a process similar to the current cluster and node pool upgrade process. However, we'll need to amend it somewhat.

As such, I'm renaming this issue to "add a unmanaged node pool for cluster autoscaler" instead of the full cluster autoscaler. Doing so will allow us to test each bit functionality well.

We'll also create a new dedicated milestone for adding cluster autoscaler support.

Compared to #579, I expect this issue and #578 to be done fairly quickly. #579 however will need more time. In the short term, the actual deployment of the autoscaler will be done manually. We'll have scripts and the manifests uploaded. However, users will need to modify them before creating.

Likewise, if the cluster has changed e.g. a new node pool added or removed, then the user will need to update the manifest and run apply again. In #579, we'll look at a way to automate that.

hyder avatar Sep 17 '22 12:09 hyder

@aibarbetta @mschmidt291 @shapeofarchitect

hyder avatar Sep 18 '22 03:09 hyder

hi @hyder, not sure where is the proper place to ask, here or in #579

Is it possible just to add the following snippet to modules/oke/nodepools.tf?

  lifecycle {
    ignore_changes = [
      node_config_details[0].size
    ]
  }

Lots of ppl don't use the whole module, rather only a few submodules( modules/oke in our case), and it would solve the issue where Terraform needs to reconcile changes made by cluster-autoscaler (installed simply with Helm)

alex-old-user avatar Jan 24 '23 11:01 alex-old-user

@devoncrouse can you please look into this?

hyder avatar Jan 24 '23 12:01 hyder

This was affected by https://github.com/oracle-terraform-modules/terraform-oci-oke/pull/624 , before that the whole object (including size) was ignored.

The issue with adding the ignore is that it's not possible to change the size anymore after creation, effectively changing the size parameter into an initial_size.

If this is the intended way to do things, I'll look into installing the autoscaler and stop managing the size via terraform.

12345ieee avatar Jan 24 '23 12:01 12345ieee