mattermost-operator icon indicating copy to clipboard operation
mattermost-operator copied to clipboard

Allow overrides to update-check job spec

Open mjnagel opened this issue 2 years ago • 6 comments

Summary

From a quick scan through the code I don't believe there is currently any way to override parts of the spec for the update-check job that runs with version updates. It may be desirable to override various parts of this job, for example:

  • securityContext: Enforce best practices for security such as runAsNonRoot, dropping all capabilities, etc
  • labels/annotations: To better group things together for ownership and provide for integration with externally managed resources (i.e. networkPolicy pod selectors)
  • startup command: This is a very niche use-case but when deploying in an istio-injected environment the sidecar needs to be terminated by running additional commands against its endpoint (can provide more details if needed)

Ultimate it would be great to be able to override different portions of the pod spec.

Possible fixes

Add a new section to the Mattermost CRD spec that allows for update check pod spec overrides. This would honestly be beneficial for the primary Mattermost pods as well, since currently you have to utilize resourcePatch to add labels/annotations to the pods (which is not very beginner friendly or easy to navigate, especially with multiple changes.

Proposed Change:

spec:
  podTemplate:
    <allow anything for podspec, or allow subset including annotations/labels/securitycontext
  updateCheck:
    podTemplate:
      <allow anything for podspec, or allow subset including annotations/labels/securitycontext

mjnagel avatar Jun 29 '22 18:06 mjnagel

I'm happy to attempt work on these changes, wanted to open the issue to get the conversation started and see if this would be supported. The primary motivation for this is coming from my work on istio injecting Mattermost, without an ability to disable the sidecar (via labels) or terminate the sidecar (via command override) for the update-check upgrades will require manual intervention to successfully complete.

mjnagel avatar Jun 29 '22 18:06 mjnagel

Hey @mjnagel, thanks for submitting the issue.

I agree that adding a way to configure an update job would be very useful, as well as other things in the podTemplate section. We though about something like that in the past, however there are some drawback that make me a little hesitant for allowing overrides of the whole spec.

A big chunk of podTemplate is created based on other things in Mattermost CR or some Operator defaults. This poses a problem, what to do when the user provides his own template? We could just use the provided one and disregard all things that would be set by the Operator, but then user would need to specify all init containers, port, env vars etc. by hand so it seems like using Operator make little sense in such case. We could alternatively do some kind of merging, but then it is also not clear what to do if user explicitly wanted to omit some configuration set by the Operator. We might also run into some configuration edge cases given that some things in Pod Spec can be configured from other CR fields and this information may be used in other places in Operator.

So for exposing the whole podSpec in CR, it does not seem to me there is a clear way of doing that without subverting user expectations in some way but still making Operator usable. I am however open to ideas, given that we are thinking of releasing new CR version.

Szymongib avatar Jul 04 '22 07:07 Szymongib

@Szymongib definitely bring up some good points here...it's a balance between providing customization to the end user and making sure that things won't break with that customization.

Could potentially go the route of throwing out any conflicting overrides (i.e. if I try to override the app label, the operator ignores my override). Obviously this would need to be well documented/logged so the end user understands what is happening. The option to override that label would still be available, but would require using resourcePatch. Or instead of throwing out conflicts you could have a toggle in the spec that would change the precedence of user values over operator defaults.

Definitely just throwing things out there...I might have to do some research into what other operators have done to handle these scenarios, since I know we make use of several where labels/annotations values are accessible.

mjnagel avatar Jul 05 '22 21:07 mjnagel

This will be a long comment...I've been poking around through each of the areas I identified and seeing what the most feasible path forward would be (in my mind). I realize some of this deviates from the initial issue (or at least the title) so I'm happy to open up other issues to tackle individual pieces of this. Here's some of my thoughts/conclusions...

Labels

It seems like adding labels is already available via spec.resourceLabels. Given this, it seems like we're already running the risk of an end user overriding one of the default labels.

I did a quick test adding:

spec:
  resourceLabels:
    app: mm-test

The result appears to be that the operator fails to reconcile the change due to mismatch with selector labels, throwing a log error:

time="2022-07-20T15:27:22Z" level=info msg="[opr.controllers.Mattermost] Updating resource" Reconcile=mattermost Request.Name=mattermost Request.Namespace=mattermost kind="&TypeMeta{Kind:,APIVersion:,}" name=mattermost namespace=mattermost patch="{\"metadata\":{\"labels\":{\"app\":\"mm-test\"}},\"spec\":{\"template\":{\"metadata\":{\"labels\":{\"app\":\"mm-test\"}}}}}"
time="2022-07-20T15:27:22Z" level=error msg="[opr.controller.mattermost] Reconciler error" error="failed to update mattermost deployment: Deployment.apps \"mattermost\" is invalid: spec.template.metadata.labels: Invalid value: map[string]string{\"app\":\"mm-test\", \"installation.mattermost.com/installation\":\"mattermost\", \"installation.mattermost.com/resource\":\"mattermost\"}: `selector` does not match template `labels`" name=mattermost namespace=mattermost reconciler group=installation.mattermost.com reconciler kind=Mattermost

While maybe unintentional, the result is effectively that the user's override is thrown out/ignored due to the way the operator handles the labels vs selector labels:

  • selectorLabels will give precedence to the default operator provided app label (see here)
  • labels will always give precedence to the user's labels (see here)

My suggestion on the way forward would be to pursue the path of giving the operator defaults precedence (in all cases) and logging a warning message in the operator logs when the user attempts to override one of the operator defaults. If I'm not mistaken the only operator default labels are:

  • ClusterLabel: installation.mattermost.com/installation
  • ClusterResourceLabel: installation.mattermost.com/resource
  • "app label": app

In my opinion preventing a user from overriding these labels would be within the normal role of an operator, and still would provide flexibility to the end user to add extra labels as needed.

Annotations

To my knowledge the only annotations currently set on MM pods are the prometheus annotations when enterprise is enabled. The three annotations are:

  • prometheus.io/scrape=true: I could potentially see a use case for an end user who wants to disable scraping, but I don't know if that's a valuable use case to support.
  • prometheus.io/path=/metrics: This one an end user should never change since MM only exposes metrics on this path
  • prometheus.io/port=8067: Same thing here, the operator does not allow overriding the port so this should never change

I think I'm coming down to the same conclusion as with labels on this one, accept the end user's overrides and give precedence to the 3 defaults for prometheus. There may be room on this one for a special case with the scrape true/false annotation, but that could be done via a special flag for metrics if we want to support it?

Security Context

From a review of the code I couldn't find any places where a default securityContext is set. I think this is absolutely one that should be exposed to the end user - in more restrictive environments users might be running tools like OPA Gatekeeper or Kyverno to enforce security policies, some of which may require (for example) runAsNonRoot or similar security context settings. It would be beneficial to provide accessibility to both the container and pod security context I think, since some settings can only be set on one or the other.

Update Check Spec

Currently the spec.template.spec portion of the job is a copy from the base Mattermost deployment that is then modified for use as an update job. For the four areas I identified in my original issue:

  • Labels: In my opinion these could be taken directly from an end user's overrides, with the exception of the app label which is set by default. You could handle this the same way as the standard labels, throwing a log warning when an end user attempts to override it.
  • Annotations: Since none are set by default it should be simple to accept the end user's overrides.
  • Security context: Same case here.
  • Command: Again, very niche usecase and providing annotation support would be sufficient to make things work for my intended use (istio sidecar injection). Alternatively perhaps there's room for adding support for something like a postUpdateCommands which would be appended to the default command, providing room for the end user to add something to run after (in my case this would be a sidecar termination script)

Summary

The path forward I'm proposing here would look like this in terms of changes to the spec:

spec:
  # Note that this already exists, I'm just suggesting better visibility/handling of these labels
  resourceLabels:
    # This override would be thrown out, result in a logged warning
    app: mm-test
    # This label would be added to all resources
    owner: mjnagel
  # Applies to MM pods in the deployment
  podAnnotations:
    # This override would be thrown out, result in a logged warning
    prometheus.io/port: 8067
    # This annotation would be added to MM pods in the deployment
    owner: mjnagel
  # Applies to MM pods in the deployment
  podSecurityContext:
    runAsNonRoot: true
  # Applies to all containers in the MM deployment
  containerSecurityContext:
    capabilities:
      drop:
        - ALL
  updateCheck:
    # The below operate in the same way as their corresponding `spec.*` options
    podLabels: ...
    podAnnotations: ...
    podSecurityContext: ...
    containerSecurityContext: ...
    # These commands would be appended to the default command run in the job
    postUpdateCommands:
      - echo "Stopping the istio proxy..."
      - curl -X POST http://localhost:15020/quitquitquit

@Szymongib This is definitely a lot (both to read and to consider adding) and I'm sure the naming/organization could be simplified in some ways. Definitely let me know your thoughts and I'm happy to spin off new issues that tackle individual portions of the changes here if that is more appropriate for discussion/working.

mjnagel avatar Jul 20 '22 16:07 mjnagel

Thanks a lot for diving into this, @mjnagel, and for such detailed write up.

I definitely agree that we can improve with labels and annotations handling as a lot of this things were implemented very early on on never revisited, hence a weird behaviour with labels etc.

We wanted to add support for securityContext for some time now as it was requested a bunch of times already. There was even an issue, tho it was closed by the creator #286, I have reopened it as I think it is still something we want to add natively, without the need of using resourcePatch. This would be a good starting point to opening customization of Mattermost deployment/pods.

Another thing you mentioned, that I strongly agree with are annotations, I like the idea of not overriding those that should not be overriden, perhaps we could also explore disabling them with something like monitoring: disabled, tho that is probably something for the future.

Regarding the labels, resourceLabels currently apply to all resources created by Operator. Perhaps we could use more granularity there on top of that. I think what you propose makes total sense, however changing how the labels priority is handled could be considered a breaking change, therefore we would need to postpone releasing it to the next major version.

With the update check job, the spec is copied as this is a simple way to ensure that:

  • It avoids a bunch of edge cases of some specific Mattermost configuration.
  • It runs with the same config as Mattermost pods, therefore should not do anything unexpected.

I think we should allow some customization there as now it is pretty highly opinionated thing that users should have a way to customize / disable. I would be really cautious with changing the logic of copying the spec, so perhaps we could start with applying some overrides on top of that, and annotations seems like a good start point, given that securityContext could be copied from the pod spec.

Regarding the shape of the config, we are trying to move more towards grouping things by resource as it makes it easier for user to find and understand particular config options, handle things from code perspective, as well as make it more extendable for the future. I could see therefore something like:

spec:
  pod/deployment: # depending how granular we want to go we could do deployment.podTemplate etc...
    podSecurityContext: ...
    annotations: ... # if we want something pod/deployment 
    ...
  updateCheck:
    ...

Szymongib avatar Jul 21 '22 18:07 Szymongib

To narrow the focus of this issue to just the update-check I opened #298 to handle the labels/annotations support for the main app server pods.

mjnagel avatar Aug 11 '22 20:08 mjnagel