helm-www icon indicating copy to clipboard operation
helm-www copied to clipboard

Document charts best practices (from initiative in helm/charts)

Open scottrigby opened this issue 6 years ago • 37 comments

Current status

The goal of me moving this issue is to be able to archive the helm/charts repo and make sure that the work started/pinned here makes its way into helm.sh/docs.

Some of this has already made it's way into the docs, but there are other valuable things collaborated on here which have not yet. It's been in the back of my mind for a while but I just haven't spent the time to move it to the docs. It may need another round of review by Charts team maintainers and others once we do, because k8s has moved a lot since then!

Original issue

Plan

From the past few Charts dev meetings, we all planned to collaborate on clarifying Charts best practices. There have been no PRs to do that thus far (as we approached and are now recovering from KubeCon), so in Today's meeting I volunteered to sketch up an issue to provide some structure to help us do this asynchronously.

The plan is to start with the lowest hanging fruit first, and iterate from there:

  1. Sketch up a simple list - in this issue - pointing to which charts are good examples of items in the running list (this would be a good place to quickly note any caveats or questions we may have about those)
  2. Make PR for adding documentation to REVIEW_GUIDELINES.md about each item in the running list
  3. Once this is done, and announced, we could then iterate on adding some version of these best practices into chartutil's create command

Domains/Topics

The idea is to break out best practices by domain/topic (or some way of thinking about grouping k8s primitives and concepts as they map to helm/charts implementation). Here's a working list, with names of volunteers so far to work on them:

  • [x] PVCs/StorageClass @prydonius (see helm/charts#1869, helm/charts#4223)
  • [x] RBAC @unguiculus (https://github.com/kubernetes/helm/pull/3408)
    • https://github.com/coreos/alb-ingress-controller/pull/146
    • https://github.com/kubernetes/helm/pull/3184
  • [ ] Network policy @omkensey (issue helm/charts#4067 - needs a PR)
  • [x] StatefulSets @dhilipkumars helm/charts#3334
  • [x] Deployment @dhilipkumars helm/charts#3334
  • [x] Jobs @dhilipkumars helm/charts#3334
  • [x] ReplicaSet @dhilipkumars helm/charts#3334
  • [x] DaemonSet @dhilipkumars helm/charts#3334
  • [x] HorizontalPodAutoscaler @paulczar helm/charts#7562 (see https://github.com/helm/charts/issues/3011#issuecomment-417043738)
  • [x] Ingress @prydonius @scottrigby (#7590)
  • [ ] Affinity/Antiaffinity (briefly mentioned in helm/charts#3334)
  • [x] ServiceAccount (see https://github.com/kubernetes/helm/issues/3320, https://github.com/kubernetes/helm/pull/3408)
  • [ ] PodSecurityPolicy @paulczar
  • [ ] PodDisruptionBudget (briefly mentioned in helm/charts#3334. See https://github.com/helm/charts/issues/3011#issuecomment-419233266)
  • [x] Label immutability (See helm/charts#7680, helm/charts#7803, helm/charts#7726) @desaintmartin
  • [ ] Standard keys & terminology
  • [ ] Standard commands
    • When to use include and template (See https://github.com/helm/charts/issues/3011#issuecomment-457637650, https://github.com/helm/charts/issues/3011#issuecomment-457640417, https://github.com/helm/helm/issues/4092 and https://github.com/helm/helm/pull/4093)
  • [x] Versioning @scottrigby (#11207)
  • [ ] OWNERS file (We should note what was decided here https://github.com/helm/charts/issues/7446#issuecomment-443466712 in the REVIEW_GUIDELINES)

Related issues

  • https://github.com/kubernetes/helm/issues/1993
  • https://github.com/kubernetes/helm/pull/2007
  • https://github.com/kubernetes/charts/issues/44

scottrigby avatar Dec 12 '17 18:12 scottrigby

@scottrigby Thanks for pulling this issue together

mattfarina avatar Dec 12 '17 18:12 mattfarina

The comments on this file are interesting. If we're using these charts for testing those areas (PVC usage, StatefulSet, Operator & CustomResource), does it mean they're are good examples of each area? https://github.com/kubernetes/charts/blob/master/test/helm-test/whitelist.yaml

charts:
  - stable/postgresql #PVC usage
  - stable/cockroachdb #StatefulSet
  - stable/etcd-operator #Operator & CustomResource
  - stable/prometheus
  - stable/jenkins
  - incubator/zookeeper

scottrigby avatar Dec 12 '17 19:12 scottrigby

Another note before I forget: I'd say stable/jenkins and stable/drupal have good examples of Ingress. But I think the helm create chartutil command does a better job at that. Any thoughts on this one?

scottrigby avatar Dec 12 '17 19:12 scottrigby

@scottrigby I see what you're saying on ingress. Did you want to write up a PR for the best practice based on chartutil? If not, I can find the time.

mattfarina avatar Dec 13 '17 20:12 mattfarina

Hope to get to this next week!

prydonius avatar Dec 15 '17 10:12 prydonius

Quick note to anyone who may not have seen it: there's a discussion thread on Slack for some questions about how we might approach this: https://kubernetes.slack.com/archives/C6E3XH1ED/p1513324858000143 I can summarize here if that's helpful, but maybe good to chat there then summarize once we've clarified some of the questions? Or we can chat here. I'm happy to help keep both in sync as we hammer it out.

scottrigby avatar Dec 16 '17 21:12 scottrigby

@mattfarina ^ I've been slightly busy but can do this w @prydonius or sketch up on my own either way - I'd just like some feedback about the above thread for higher level questions, about where the files should live in general etc.

scottrigby avatar Dec 16 '17 21:12 scottrigby

I updated the initial experience created with helm create. https://github.com/kubernetes/helm/pull/3337

unguiculus avatar Jan 12 '18 09:01 unguiculus

@scottrigby PTAL

dhilipkumars avatar Jan 15 '18 16:01 dhilipkumars

I'd also like to get some input on more low-level components that should or should not be part of a Chart:

  • resource allocations: should we prefer simply having a resources: item?
  • adding custom pod/service annotations (needed for prometheus scraping, for instance)
  • adding custom pod/service labels (for lots of custom, local reasons this can be very beneficial)
  • imagePullSecrets (we generally allow overriding of the image, this should be part of it as well?)

I'd love to see these added to the guidelines (CONTRIBUTING.md), unless this is already stated somewhere and I'm lookin in the wrong location?

timstoop avatar Mar 26 '18 22:03 timstoop

Also, it might be good to explain versioning a bit. I'm looking at helm/charts#4453 and I'm not sure if the version of the Chart should start with 1.0.0, as I notice most Charts even in stable are versioned like 0.x.y. Or again, is this explained somewhere and I just cannot find it?

timstoop avatar Mar 26 '18 22:03 timstoop

While on the subject, there seems to be some different ideas on when to update the patch level and when to update the minor version. I personally think that patch level should be used for pure fixes, where stuff that wasn't working properly before (or documented incorrectly or something) is fixed, but any new feature, however small, would bump the minor version. Which kind of means that any additional variable should almost automatically increase the minor version of a Chart. But I'm the new guy here, so please let me know if I'm wrong.

timstoop avatar Mar 27 '18 06:03 timstoop

Thanks for bringing all these points up @timstoop, we definitely need to add guidelines about all of these things in the REVIEW_GUIDELINES doc. Here are my 2 cents:

  • resource allocations - these can be very specific to the cluster environment, so I suggest charts include a value to configure the resources, but not set any defaults. In particular, there shouldn't be any limits set by default. It may make sense to set requests, but I'm not entirely sure about that.
  • custom annotations - this is definitely useful, I'm just not sure how we should implement it. Should one set of annotations be applied to all resources, or do we need to have separate configurable annotations for each resource type (Deployment, Service, etc.)
  • imagePullSecrets - IMO imagePullSecrets are better configured via ServiceAccounts to apply across the namespace, or possibly using PodPresets in the future.
  • versioning - I think most charts start out at 0.1.0 (helm create sets this by default), but I think this should be entirely up to the chart developer. However, we should be making sure that updates to charts adhere to semantic versioning (patch = bug fixes, minor = features, major = backwards incompatible features, as you suggested).

prydonius avatar Mar 27 '18 16:03 prydonius

  • resource allocations - I think that for some Charts it may make sense to set minimums (eg. it makes no sense to start prometheus with 10m cpu and 1Mi memory), but I agree that they should probably not be set at all. That also allows some Operators I've seen developed to set those automatically based on average usage.
  • custom annotation/labels - I think it doesn't make sense to not provide the option for all resource types deployed by a Chart, as you'll have to end up making manual changes then. I say, if you gonna do this, do it correct the first time :)
  • imagePullSecrets - Honestly, I didn't even know you could set it via the ServiceAccount. But that just diverts the issue, I think, as a lot of Charts create a custom ServiceAccount in which you'd have to add it then as well. Right?

timstoop avatar Mar 27 '18 18:03 timstoop

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

stale[bot] avatar Aug 19 '18 05:08 stale[bot]

poke, to remove stale

mattfarina avatar Aug 20 '18 15:08 mattfarina

This should also be referenced: helm/charts#5167 Thanks @olemarkus for reminding me about that one

scottrigby avatar Aug 20 '18 21:08 scottrigby

The guidelines should recommend stricter adherence to semver. For example recommend that all charts in the stable repo is +1.0.0. Currently Helm documentation states that semver is used. It is also mentioned briefly here: https://github.com/helm/charts/blob/master/PROCESSES.md#un-deprecating-a-chart

olemarkus avatar Aug 22 '18 14:08 olemarkus

Note there may be a need to clarify RBAC best practices. See https://github.com/helm/charts/pull/6407#pullrequestreview-149020512 and https://github.com/helm/charts/pull/6407#issuecomment-415497659. Will need to ask @viglesiasce about thoughts on how to clarify.

scottrigby avatar Aug 23 '18 18:08 scottrigby

And updating charts with auto-generated passwords as discussed in Helm call and on this issue: helm/charts#5167

alexvicegrab avatar Aug 24 '18 09:08 alexvicegrab

Question maybe for our next chat… what about best practices for HPAs? It seems only stable/spark and stable/nginx-ingress define a HorizontalPodAutoscaler in templates (stable/dask-distributed has a reference to worker.replicasMax in it's README, but that seems like a copy from spark, as there's no reference to this in the templates). Anyway, should we recommend bundling HPAs with charts as a best practice, or suggest instead using a standalone controller (like https://banzaicloud.com/blog/k8s-hpa-operator/)?

scottrigby avatar Aug 29 '18 17:08 scottrigby

@paulczar ^ I updated the description to add helm/charts#7562

scottrigby avatar Sep 06 '18 14:09 scottrigby

Re Affinity / anti-affinity, there's a note added in helm/charts#3334, does someone want to add an example (perhaps draw from redis-ha values and deployment template)?

scottrigby avatar Sep 06 '18 15:09 scottrigby

I added PodSecurityPolicy to the list @paulczar

scottrigby avatar Sep 06 '18 18:09 scottrigby

Should PodDisruptionBudget be added too? https://kubernetes.io/docs/tasks/run-application/configure-pdb/ . Also see https://github.com/helm/charts/blob/master/stable/nginx-ingress/templates/controller-poddisruptionbudget.yaml - one thing to note is to not enable it if replicacount == 1 as this disables eviction

davidkarlsen avatar Sep 06 '18 18:09 davidkarlsen

@davidkarlsen Good question - there's a short note about PDB in the REVIEW_GUIDELINES, with a link to the docs (from helm/charts#3334):

It is recommended that Deployments and StatefulSets configure their workloads with a Pod Disruption Budget for high availability.

I'd personally love to see a canonical boilerplate example values file and template entry for all of these either within the review guidelines or linked from there to some canonical example somewhere else (some of the resources have them already, while others don't). I'd also like to see very important notes (for example your note, where certain configs are incompatible) as comments in the example values file snippet.

Thoughts?

scottrigby avatar Sep 06 '18 20:09 scottrigby

Ah - I missed the fact that it was already mentioned. But yes - a complete example-chart would be very nice and the easiest to point to.

davidkarlsen avatar Sep 06 '18 21:09 davidkarlsen

Per office hours Today, we need to address changing labels. Noting here because we will also want to ensure the best practices around this are documented.

  • @davidkarlsen added this issue "Discrepancy in documentation of labels" https://github.com/helm/charts/issues/7550
  • @desaintmartin will make a new charts issue for this helm issue later Today, and link back here
  • There is an existing Helm issue https://github.com/helm/helm/issues/2494
    • According to @bacongobbler (here):

    Short of helm delete --purge myrelease and running helm install again on your chart, there is no way to mutate the selector labels

    • Let's chat about this in the next SIG Apps call, to see if we can brainstorm another workaround (because… that would be unfortunate as the only upgrade path)
    • Speaking of, we still don't have a solution for how to handle breaking changes such as this. See https://github.com/helm/charts/issues/5657

scottrigby avatar Sep 11 '18 16:09 scottrigby

Here is the issue: https://github.com/helm/charts/issues/7680 .

desaintmartin avatar Sep 11 '18 22:09 desaintmartin

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

stale[bot] avatar Oct 12 '18 03:10 stale[bot]