prometheus-engine
prometheus-engine copied to clipboard
OperatorConfig management
Is it possible to have a better way to manage OperatorConfig? Possibly with a config map or a custom resource similar to a PodMonitoring
configuration.
Manually editing the yaml as suggested by the documentation isn't ideal.
Currently I apply changes to OperatorConfig across my environments using kubectl patch
with terraform but this method isn't great either. I recently had an issue with too many metrics being scraped as I had enabled kubeletScraping
for testing, after I removed the kubeletScraping
config from my local yaml file I apply with kubectl patch
it did not remove the config itself from kubernetes.
yeah, can't manage it with TF (without hacks to patch), can't use helm (does not want to modify resources it does not own).
I'm trying (also hackish) solution to use kubernetes_labels/annotation
TF resources to make helm "own" it and then use helm chart to manage it, but it is also not great...
Hi @shpml and @bjakubski,
I recently had an issue with too many metrics being scraped as I had enabled kubeletScraping for testing, after I removed the kubeletScraping config from my local yaml file I apply with kubectl patch it did not remove the config itself from kubernetes.
Are you trying to remove the entire OperatorConfig
using patch? I'm not sure I follow.
With regards to managing the OperatorConfig
, you should be able to update it and source-control it the same way as you would a PodMonitoring
, no? It's just another custom resource watched by the operator. The main difference is that it's a singleton in a fixed namespace in the cluster. So, in theory, there should be less of them to manage than PodMonitoring
s.
But maybe I'm misunderstanding your question 🙂
Hi @pintohutch
I patch the OperatorConfig
to limit the metrics I want to send to Cloud Monitoring. I do this using kubectl patch operatorconfig/config --namespace gmp-public --type merge --patch-file ${path.module}/templates/operator_config_patch.yaml
. This command is run by terraform using local-exec
.
collection:
filter:
matchOneOf:
- '{__name__=~"puma_.+"}'
- '{__name__=~"action_cable_.+"}'
- '{__name__=~"sidekiq_.+"}'
- '{__name__=~"k8s_app:.+"}'
I can't really manage this in terraform any other way as the resource is created automatically when managed prometheus is enabled on the cluster. Managed prometheus is enabled with the command gcloud container clusters update --enable-managed-prometheus
for now but I will migrate to using the terraform config now that this issue is closed but it doesn't fix my problem.
If I try manage the OperatorConfig
with terraform using the below kubernetes_manifest
I get an error as the OperatorConfig
resource is already created. I assume managed prometheus only works with an OperatorConfig
named "config"
. If not I can just deploy an OperatorConfig
with a different name.
resource "kubernetes_manifest" "operatorconfig_gmp_public_config" {
manifest = {
"apiVersion" = "monitoring.googleapis.com/v1"
"collection" = {
"filter" = {
"matchOneOf" = [
"{__name__=~\"puma_.+\"}",
"{__name__=~\"action_cable_.+\"}",
"{__name__=~\"sidekiq_.+\"}",
"{__name__=~\"k8s_app:.+\"}",
]
}
}
"kind" = "OperatorConfig"
"metadata" = {
"labels" = {
"addonmanager.kubernetes.io/mode" = "Reconcile"
"deployed-by" = "terraform"
}
"name" = "config"
"namespace" = "gmp-public"
}
}
}
With the PodMonitoring
config I can deploy as many configurations as I like using the below kubernetes_manifest
in terraform as an example.
resource "kubernetes_manifest" "prom_app_scaping" {
manifest = {
apiVersion = "monitoring.googleapis.com/v1alpha1"
kind = "PodMonitoring"
metadata = {
name = "${var.common_labels.env}-prom-scaper"
namespace = "default"
labels = {
"app.kubernetes.io/name" = "${var.common_labels.env}-prom-scaper"
}
}
spec = {
endpoints = [
{
interval = var.scrape_interval
path = "/metrics"
port = 3000
scheme = "http"
}
]
selector = {
matchLabels = {
prometheus-scrape = "true"
}
}
}
}
}
I hope this makes sense. I could wait for the OperatorConfig
to be created and then import it into terraform but that workaround is also not ideal and doesn't scale well across multiple projects.
Hi @pintohutch
@shpml described the issue nicely. I'll only add that it is not easily possible to manage OperatorConfig with helm too due to the same issue - this object is created by GKE/operator it is "foreign" to helm and it will refuse to update it.
Hi @shpml and @bjakubski,
Apologies for the delayed response. I'm just returning back from leave over the holidays.
Thanks for making this use-case more clear and I think I see the issue.
I do see an open issue for the Kubernetes Terraform provider to support kubectl patch
that has quite a lot of activity. It's unclear if the patch functionality would be supported for all resource types or not though (i.e. custom resources).
I assume managed prometheus only works with an OperatorConfig named "config". If not I can just deploy an OperatorConfig with a different name.
Yes that is correct. The "config"
OperatorConfig is created by the GKE control plane as a singleton resource in the cluster and is referred by name in-source.
The workarounds you've mentioned are probably the best options for now. The only other solution would be to deploy managed collection yourself through the install manifests (i.e. not through the GKE API) using the kubernetes_mainfest
for those.
There may be a path forward in a future release, but we'd need to evaluate. Do you have a proposal for how you think this would look like using a ConfigMap?
Either way, we can keep this issue open for consideration.
A proposal for how it would work with a ConfigMap would be for the operator to look for a ConfigMap named config
or something more specific like gmp-opertor-config
, or via a predefined label on a ConfigMap
, app.kubernetes.io/gmp-opertor-config=True
. The user can add configuration options here that the operator can merge into it's OperatorConfig
apiVersion: v1
kind: ConfigMap
metadata:
name: gmp-opertor-config
namespace: gmp-public
data:
config.yaml: |
collection:
filter:
matchOneOf:
- '{__name__=~"puma_.+"}'
- '{__name__=~"action_cable_.+"}'
- '{__name__=~"sidekiq_.+"}'
- '{__name__=~"k8s_app:.+"}'
Alternatively use operatorconfigs that a user could deploy to configure the operator. The operator can merge these together to use for it's configuration.
apiVersion: monitoring.googleapis.com/v1
kind: OperatorConfig
metadata:
name: gmp-opertor-config
namespace: gmp-public
collection:
filter:
matchOneOf:
- '{__name__=~"puma_.+"}'
- '{__name__=~"action_cable_.+"}'
- '{__name__=~"sidekiq_.+"}'
- '{__name__=~"k8s_app:.+"}'
Gotcha - thanks for the suggestions! We'll consider this and maybe some other options to better support this use case.
Btw, in the meantime, I wonder if using GKE ConfigSync could be an approach to keeping your OperatorConfigs in sync across clusters.
We haven't experimented much with it, but may be worth trying.
Hi @pintohutch
I patch the
OperatorConfig
to limit the metrics I want to send to Cloud Monitoring. I do this usingkubectl patch operatorconfig/config --namespace gmp-public --type merge --patch-file ${path.module}/templates/operator_config_patch.yaml
. This command is run by terraform usinglocal-exec
.collection: filter: matchOneOf: - '{__name__=~"puma_.+"}' - '{__name__=~"action_cable_.+"}' - '{__name__=~"sidekiq_.+"}' - '{__name__=~"k8s_app:.+"}'
I can't really manage this in terraform any other way as the resource is created automatically when managed prometheus is enabled on the cluster. Managed prometheus is enabled with the command
gcloud container clusters update --enable-managed-prometheus
for now but I will migrate to using the terraform config now that this issue is closed but it doesn't fix my problem.If I try manage the
OperatorConfig
with terraform using the belowkubernetes_manifest
I get an error as theOperatorConfig
resource is already created. I assume managed prometheus only works with anOperatorConfig
named"config"
. If not I can just deploy anOperatorConfig
with a different name.resource "kubernetes_manifest" "operatorconfig_gmp_public_config" { manifest = { "apiVersion" = "monitoring.googleapis.com/v1" "collection" = { "filter" = { "matchOneOf" = [ "{__name__=~\"puma_.+\"}", "{__name__=~\"action_cable_.+\"}", "{__name__=~\"sidekiq_.+\"}", "{__name__=~\"k8s_app:.+\"}", ] } } "kind" = "OperatorConfig" "metadata" = { "labels" = { "addonmanager.kubernetes.io/mode" = "Reconcile" "deployed-by" = "terraform" } "name" = "config" "namespace" = "gmp-public" } } }
With the
PodMonitoring
config I can deploy as many configurations as I like using the belowkubernetes_manifest
in terraform as an example.resource "kubernetes_manifest" "prom_app_scaping" { manifest = { apiVersion = "monitoring.googleapis.com/v1alpha1" kind = "PodMonitoring" metadata = { name = "${var.common_labels.env}-prom-scaper" namespace = "default" labels = { "app.kubernetes.io/name" = "${var.common_labels.env}-prom-scaper" } } spec = { endpoints = [ { interval = var.scrape_interval path = "/metrics" port = 3000 scheme = "http" } ] selector = { matchLabels = { prometheus-scrape = "true" } } } } }
I hope this makes sense. I could wait for the
OperatorConfig
to be created and then import it into terraform but that workaround is also not ideal and doesn't scale well across multiple projects.
@shpml would you be able to share the example of managing operatorconfig with terraform?
@shpml would you be able to share the example of managing operatorconfig with terraform?
Operator Confg
# filename="operator_config_patch.yaml"
collection:
filter:
matchOneOf:
- '{__name__=~"puma_.+"}'
- '{__name__=~"action_cable_.+"}'
- '{__name__=~"sidekiq_.+"}'
- '{__name__=~"k8s_app:.+"}'
Terraform null resource.
# Patch the operator config deployed by GCP to specify metrics to collect.
# We patch instead of managing the resource as it's deployed by GCP and likely to be updated.
# A patch only adds, it does not remove
resource "null_resource" "patch_operator_config" {
triggers = {
yaml_update = "${filesha512("${path.module}/templates/operator_config_patch.yaml")}"
# To force null_resource recreation un-comment below
# always_run = timestamp()
}
provisioner "local-exec" {
command = <<EOF
# Authenticate to cluster
gcloud container clusters get-credentials $CLUSTER_NAME --region $REGION --project $PROJECT
# Patch operatorconfig/config
kubectl patch operatorconfig/config --namespace gmp-public --type merge --patch-file ${path.module}/templates/operator_config_patch.yaml
EOF
environment = {
# used to by gcloud to auth with correct cluster
CLUSTER_NAME = var.cluster_name
PROJECT = var.project_id
REGION = var.region
}
}
}
Have you faced any side effects or issues with this? One of the thing I am noticing is if I remove the entire patch config it does not remove it(this is just I am trying locally). Have you thought about deleting entirely the operator config and managing it? @shpml