api icon indicating copy to clipboard operation
api copied to clipboard

Add autoscale enabled fields and PDB enabled field for IstioOperatorSpec

Open carolynhu opened this issue 4 years ago • 14 comments

Introducing the required fields to do the validations for resolving this issue: https://github.com/istio/istio/issues/24000

carolynhu avatar Nov 06 '20 16:11 carolynhu

😊 Welcome @carolynhu! This is either your first contribution to the Istio api repo, or it's been awhile since you've been here.

You can learn more about the Istio working groups, code of conduct, and contributing guidelines by referring to Contributing to Istio.

Thanks for contributing!

Courtesy of your friendly welcome wagon.

istio-policy-bot avatar Nov 06 '20 16:11 istio-policy-bot

Looking at the issues I'm not sure this is the right fix, is there an RFC or something describing the proposal and alternative ways to fix this?

We do not have an RFC, we want to fix that issues by some logic like the following to make sure HPA and PDB are compatible:

if Components.pilot.k8s.autoscaleEnabled && Components.pilot.k8s.pdbEnabled {
	Components.pilot.k8s.hpa.minReplicas >= (Components.pilot.k8s.pdb.minAvailable + 1)
	Components.pilot.k8s.replica_count >= (Components.pilot.k8s.pdb.minAvailable + 1)
}

carolynhu avatar Nov 11 '20 20:11 carolynhu

Personally I'm not comfortable approving this PR without better understanding what problem it is intended to solve, the documentation in the proto doesn't really explain what its for, and it looks misplaced compared to the rest of the proto.

A short doc describing the problem and proposed solution (and ideally other alternatives) would help a lot.

smawson avatar Nov 12 '20 00:11 smawson

@smawson to give you some context, there are currently fields in the helm API that enable installing HPAs and PDBs but there's no equivalent in IstioOperator. i think we need some way to control whether these are installed, and we are deprecating values.yaml, so this is part of what is likely to be the unified API for both operator and helm (k8sResourceSpec). i think k8sResourceSpec is the right place for these because we don't want to modify the k8s HPA or PDB specs. alternatively these could also go into ComponentSpec. this is an incremental change which is due to the issue in mentioned in the PR. it is not required to fix that issue. if you prefer to see all the changes all done at once we can defer this change until that happens.

ostromart avatar Nov 12 '20 01:11 ostromart

Can one of you please write a very short document here? Or update the documentation in the code to have it clearly describe exactly what the field does and why a user would want to use it, ideally pointing at appropriate documentation.

For example if I look at the PDB spec it requires a selector, a min available and a max unavailable. Does setting podDisruptionBudgetEnabled create a PDB? If so, what does it set selector, min available and max unavailable to? If setting this field does not create a PDB spec, what is it for?

It claims to install a PDB spec but doesn't document what that spec will look like or even what it means by install.

The same goes for the autoEnabled field. Documenting that the autoscaleEnabled field enables autoscale is just repeating the name of the field. We need to document what setting the field actually does, and how a user should use it.

The rest of the fields in this proto are much more straightforward, since they set existing k8s fields and we can point to the documentation of those fields.

My point is not to explain to me what this field does in this PR, it is to explain in the documentation to readers of the documentation what it does so they would know when they should set these fields.

smawson avatar Nov 17 '20 16:11 smawson

Writing a doc seems like overkill for these two fields, could we just expand the comments? Here's what I propose:

autoScaleEnabled: Setting this field to true will create a HorizonalPodAutoscaler (HPA) for the Deployment in this component, with default settings defined by spec.components.<component-name>.k8s.hpaSpec in the selected profile. These defaults can be overridden e.g.:

spec:
  profile: demo
  components:
    pilot:
      k8s:
        autoScaleEnabled: true
        hpaSpec:
          minReplicas: 5 # changes from the default defined in the `demo` profile.

podDisruptionBudgetEnabled: Setting this field to true will create a PodDisruptionBudget (PDB) for the Deployment in this component, with default settings defined by spec.components<component-name>.k8s. podDisruptionBudget in the selected profile. These defaults can be overridden e.g.:

spec:
  profile: demo
  components:
    pilot:
      k8s:
        podDisruptionBudgetEnabled: true
        podDisruptionBudget:
          minAvailable: 2 # changes from the default defined in the `demo` profile.

We can add further documentation in the context of installation in istio.io, with some examples.

ostromart avatar Nov 18 '20 03:11 ostromart

That sounds good to me.

smawson avatar Nov 19 '20 21:11 smawson

In other word- for any operator-specific change ( and for every helm-specific setting ) I would like to see doc describing how the other is handling it and why it must be specific to only one install method if it's not done in both. We're now supporting again Helm, and we agreed to move as much as possible to MeshConfig - and have clear explanation of things that can't be there.

This may be a case - but we should still understand if that's what we want to do, and how we're dealing with Operator being modeled after the old style ( multiple components at the same time, which we know created problems - and most other components are already gone )

costinm avatar Nov 22 '20 04:11 costinm

@costinm the plan is to have a single API to cover helm and operator and the starting point will be KubernetesResourcesSpec. once this change goes in we will refactor the helm charts to optionally take either this or the old values.yaml APIs, and eventually remove values.yaml. the key is to move out any lingering values like this that don't make sense in meshconfig to a place that will work for both operator and helm.

ostromart avatar Nov 23 '20 17:11 ostromart

Still - an RFC explaining how the final ( common/ shared ) config will loook like, what is the upgrade plan, how will the existing values be migrated is needed.

costinm avatar Nov 23 '20 21:11 costinm

agreed, ideally it would've been better to start from that RFC and include this change as part of that. the RFC is coming in 1.9 but it will be a while - do we want to block this bug to wait for that? this change looks to be quite isolated so i think the chance that it would conflict with something else in the RFC seems low.

ostromart avatar Nov 23 '20 23:11 ostromart

The bug seems to be about adjusting the default values. I don't think we need API changes for that.

We already have settings to enable the autoscaling - and we had them for a long time (pilot.autoscaleEnabled) . Adding a new API has upgrade risks - again, a RFC would describe how this will be automated, etc.

Since we moved to istiod and single-component - I agree we need to move replicaCount, autoscaleEnabled to top level, without 'pilot' - but we need to do this in a consistent way and review this as an API, including upgrade discussion and including the people working on helm.

Are there common patterns in helm ? I know 'replicaCount' for example is a common pattern in helm - is there an equivalent for autoscale ? Should autoscale be auto-disabled if replicaCount is set ? RFC and WG are the right place do discuss this, and make sure we don't forget about upgrade.

On Mon, Nov 23, 2020 at 3:41 PM Martin Ostrowski [email protected] wrote:

agreed, ideally it would've been better to start from that RFC and include this change as part of that. the RFC is coming in 1.9 but it will be a while - do we want to block this bug to wait for that? this change looks to be quite isolated so i think the chance that it would conflict with something else in the RFC seems low.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/istio/api/pull/1735#issuecomment-732489500, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAUR2QMTUS5ULLQMEV3H23SRLXLVANCNFSM4TM43GIQ .

costinm avatar Nov 24 '20 15:11 costinm

Where this change helps is the code for the fix can be written against both APIs, instead of having to go back and write it again when the new API comes along.

ostromart avatar Nov 25 '20 17:11 ostromart

@carolynhu: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

istio-testing avatar Jan 30 '21 21:01 istio-testing

🚧 This issue or pull request has been closed due to not having had activity from an Istio team member since 2020-11-24. If you feel this issue or pull request deserves attention, please reopen the issue. Please see this wiki page for more information. Thank you for your contributions.

Created by the issue and PR lifecycle manager.

istio-policy-bot avatar May 15 '24 23:05 istio-policy-bot