helm-charts icon indicating copy to clipboard operation
helm-charts copied to clipboard

[tempo-distributed] add /var/tempo emptyDir mount for querier, queryfrontend and compactor

Open rlex opened this issue 1 year ago • 9 comments

When running with hardened securitycontext, querier, queryfrontend and compactor will fail to start with following error:

level=error ts=2022-08-07T13:47:47.444953646Z caller=main.go:109 msg="error running Tempo" err="failed to init module services error initialising module: store: failed to create store mkdir /var/tempo: permission denied"

Or read-only filesystem if you have

readOnlyRootFilesystem: true

This PR adds emptydir to that components with mounted /var/tempo, which allows them to start with non-root user.

However, there is still issue with following settings:

readOnlyRootFilesystem: true

In tempo-query-frontend pod, in tempo-query container, following error is present:

2022-08-07T14:09:06.263Z [ERROR] tempo-query: plugin init error: error="open /tmp/plugin164587256: read-only file system" timestamp=2022-08-07T14:09:06.263Z

rlex avatar Aug 07 '22 14:08 rlex

Open to suggestions. /var/tempo is ephemeral anyway since it was writing in-pod, so switching it to emptyDir should pose no issues?

rlex avatar Aug 07 '22 14:08 rlex

I like this improvement, but I think we need to mention the change in the readme. It may impact other deploys.

Also, as discussed in the issue linked above, how do you feel about defaulting to false for tempo-query? (would also require a note in the readme)

joe-elliott avatar Aug 08 '22 14:08 joe-elliott

Should we also set default securitycontext, like it's done in mimir and loki charts? I run mine with

      - tempo:
          securityContext:
            capabilities:
              drop:
                 - ALL
            readOnlyRootFilesystem: true
            runAsNonRoot: true
            runAsUser: 1000
            runAsGroup: 1000

The only component with PV is ingester, and with changing user UIDs it might require manual intervention to change files ownership. However, ingester persistence is disabled by default. All other components should be fine.

rlex avatar Aug 08 '22 14:08 rlex

I like the idea of adding the default security context. Should we do that in a different PR? or does it make sense along with these changes?

joe-elliott avatar Aug 08 '22 19:08 joe-elliott

In tempo-query-frontend pod, in tempo-query container, following error is present:

2022-08-07T14:09:06.263Z [ERROR] tempo-query: plugin init error: error="open /tmp/plugin164587256: read-only file system" timestamp=2022-08-07T14:09:06.263Z

Wouldn't that be solved by adding an emptyDir mount for /tmp in that case?

BitProcessor avatar Aug 08 '22 21:08 BitProcessor

Should we also set default securitycontext, like it's done in mimir and loki charts? I run mine with

      - tempo:
          securityContext:
            capabilities:
              drop:
                 - ALL
            readOnlyRootFilesystem: true
            runAsNonRoot: true
            runAsUser: 1000
            runAsGroup: 1000

The only component with PV is ingester, and with changing user UIDs it might require manual intervention to change files ownership. However, ingester persistence is disabled by default. All other components should be fine.

You might also want to set the fsGroup to avoid issues with the PV

BitProcessor avatar Aug 08 '22 21:08 BitProcessor

In tempo-query-frontend pod, in tempo-query container, following error is present:

2022-08-07T14:09:06.263Z [ERROR] tempo-query: plugin init error: error="open /tmp/plugin164587256: read-only file system" timestamp=2022-08-07T14:09:06.263Z

Wouldn't that be solved by adding an emptyDir mount for /tmp in that case?

No, because grpc plugin also resides in /tmp and due to how it works: https://github.com/grafana/tempo/issues/1621 https://github.com/grafana/tempo/blob/main/cmd/tempo-query/Dockerfile Plus grafana 7 is seemingly EOL now?

I like the idea of adding the default security context. Should we do that in a different PR? or does it make sense along with these changes?

Probably makes sense adding to that PR too, since now it's compatible with full set of rules.

You might also want to set the fsGroup to avoid issues with the PV

Yeah, probably should do the trick in majority of cases, but i vaguely recall that not every CSI supports fsGroup, so probably still worth mentioning in changelog.

rlex avatar Aug 08 '22 22:08 rlex

seems like i need to split podSecurityContext and containerSecurityContext

rlex avatar Aug 09 '22 14:08 rlex

Since fsGroup can't be used at container level, i added podSecurityContext for fsGroup. But since we now have pod-wide securityContext, maybe it will be better to move other settings to top level?

rlex avatar Aug 11 '22 07:08 rlex

Rebased. @joe-elliott can you talk a look please?

rlex avatar Aug 16 '22 13:08 rlex

I am good on this change. @rlex I appreciate your passion for security. This is not the first PR/issue you've filed where you've pushed for best practices in Tempo.

@zanhsieh currently has a block on merging b/c he's requesting changes. They will need to lift the block. @BitProcessor did you have any additional thoughts?

joe-elliott avatar Aug 16 '22 19:08 joe-elliott

Might be nice idea to kick pipelines again. I'm not sure if resolving conflicts via github UI triggers builds.

rlex avatar Aug 16 '22 20:08 rlex

Added /var/tempo emptydir to distirbutor as well - noticed it's required for 1.5. Shouldn't hurt.

rlex avatar Aug 16 '22 21:08 rlex

@joe-elliott i reverted adding store to distributor as discussed here https://github.com/grafana/tempo/issues/1648

Should be ready to merge now, i think.

rlex avatar Aug 17 '22 17:08 rlex

While at it, we might as well also add:

      automountServiceAccountToken: false
      enableServiceLinks: false

But as @joe-elliott mentioned earlier, it's a lot of changes that have nothing to do with the initial goal of the PR: "add /var/tempo emptyDir mount for querier, queryfrontend and compactor" (They are all related changes, but wondering if we should split it into multiple PR's)

BitProcessor avatar Aug 17 '22 18:08 BitProcessor

Well we can just rename it to "security hardening of tempo-distributed", maybe. Initial adding of /var/tempo emptyDir was done to make it compatible with security best practices, so, IMO, makes sense to add best practices to same PR. (especially since other charts, like loki and mimir, already comes with preconfigured securityContext and etc)

rlex avatar Aug 17 '22 18:08 rlex

Was just discussing it with Elliot ; he said the same thing :D How about my 2 suggestions ?

BitProcessor avatar Aug 17 '22 18:08 BitProcessor

Sure, why not. Should it be configurable, or static will be enough?

rlex avatar Aug 17 '22 18:08 rlex

Sure, why not. Should it be configurable, or static will be enough?

personally I would go for static - I don't see the added value of making it configurable.

BitProcessor avatar Aug 17 '22 18:08 BitProcessor

Heads up that I'm about to merge this and you will need to update your chart version. Thanks! https://github.com/grafana/helm-charts/pull/1699

joe-elliott avatar Aug 17 '22 18:08 joe-elliott

Interesting, now it fails after bump to 1.5.0.

Ingester fails to assemble ring?

rlex avatar Aug 17 '22 18:08 rlex

@joe-elliott in distributor:

level=error ts=2022-08-17T18:57:58.684074925Z caller=main.go:109 msg="error running Tempo" err="failed to init module services error initialising module: usage-report: failed to initialize usage report: mkdir /var/tempo: read-only file system"

https://github.com/grafana/helm-charts/runs/7885580755?check_suite_focus=true#step:9:284

That commit about fixing calling store for usage-report went into final 1.5.0 release?

UPD: yeah looks like it's here https://github.com/grafana/tempo/blob/v1.5.0/cmd/tempo/app/modules.go but still getting called?

rlex avatar Aug 17 '22 19:08 rlex

Usage reporting uses the "backend" to store a seed file that all the components use to know what cluster they belong to. So in a normal distributed setup the distributor will not need access to its local disk b/c it will grab the seed file from the object storage backend.

However, this chart defaults to using local storage as the backend. So on startup the distributor is going to its "backend" to find or create the seed file. Unfortunately that is local disk.

joe-elliott avatar Aug 17 '22 19:08 joe-elliott

I can add emptyDir back for distributor. Kludge for sure, but will work.

Another way will be to test it against something like minio, but that's completely different story.

rlex avatar Aug 17 '22 19:08 rlex

I can add emptyDir back for distributor. Kludge for sure, but will work.

Yeah, I think that's the best choice for now. A minio testing setup would be a much bigger lift.

joe-elliott avatar Aug 17 '22 20:08 joe-elliott

It looks like github is acting finicky now, throwing random 503s and failing pipelines

rlex avatar Aug 17 '22 20:08 rlex

let's try close and reopen this PR to see if CI got fixed.

zanhsieh avatar Aug 17 '22 21:08 zanhsieh

Yep, looks like that did the trick.

rlex avatar Aug 17 '22 21:08 rlex

Merging time! Thanks for your contribution @rlex 🚀

BitProcessor avatar Aug 18 '22 06:08 BitProcessor