kube-prometheus icon indicating copy to clipboard operation
kube-prometheus copied to clipboard

KubeControllerManagerDown & kubeSchedulerDown firing on kubeadm 1.18 cluster

Open jeanluclariviere opened this issue 4 years ago • 41 comments

What happened? Deploying kube-prometheus release-0.6 to a kubeadm boostrapped bare-metal cluster causes KubeControllerManagerDown and kubeSchedulerDown alerts to fire.

Did you expect to see some different? Alerts should not fire as everything is up.

How to reproduce it (as minimally and precisely as possible): Deploy release-0.6 with the bellow config to a kubeadm boostrapped cluster running 1.18.x

  • Prometheus Operator version: prometheus-operator:v0.42.1

  • Manifests:

local kp =
  (import 'kube-prometheus/kube-prometheus.libsonnet') +
  (import 'kube-prometheus/kube-prometheus-kubeadm.libsonnet') +
  (import 'kube-prometheus/kube-prometheus-all-namespaces.libsonnet') +
  (import 'kube-prometheus/kube-prometheus-anti-affinity.libsonnet')
  {
    _config+:: {
      namespace: 'monitoring',
    },
  };

{ ['setup/0namespace-' + name]: kp.kubePrometheus[name] for name in std.objectFields(kp.kubePrometheus) } +
{
  ['setup/prometheus-operator-' + name]: kp.prometheusOperator[name]
  for name in std.filter((function(name) name != 'serviceMonitor'), std.objectFields(kp.prometheusOperator))
} +
// serviceMonitor is separated so that it can be created after the CRDs are ready
{ 'prometheus-operator-serviceMonitor': kp.prometheusOperator.serviceMonitor } +
{ ['node-exporter-' + name]: kp.nodeExporter[name] for name in std.objectFields(kp.nodeExporter) } +
{ ['kube-state-metrics-' + name]: kp.kubeStateMetrics[name] for name in std.objectFields(kp.kubeStateMetrics) } +
{ ['alertmanager-' + name]: kp.alertmanager[name] for name in std.objectFields(kp.alertmanager) } +
{ ['prometheus-' + name]: kp.prometheus[name] for name in std.objectFields(kp.prometheus) } +
{ ['prometheus-adapter-' + name]: kp.prometheusAdapter[name] for name in std.objectFields(kp.prometheusAdapter) } +
{ ['grafana-' + name]: kp.grafana[name] for name in std.objectFields(kp.grafana) }

Anything else we need to know?: This issue is related to Kubernetes 1.18.x, it appears a few changes were made to the kube-controller-manager and kube-scheduler.

Firstly, version 1.18+ now uses the more secure https port of 10257 and disables http by default. Unfortunately, the --secure-port used by both kube-controller-manager and kube-scheduler is bound to 127.0.0.1 and not 0.0.0.0.

As a result, metrics cannot be collected until the bound address for both of these is updated.

The workaround: Updating the manifests in /etc/kubernetes/manifests/ to use the --bind-address 0.0.0.0 for both the scheduler and the controller manager will relaunch the the pods with the correct bind address, but these settings will not survive a kubeadm upgrade.

In order to persist the settings, the kubeadm-config configmap in the kube-system namespace should also be edited to include the following:

    controllerManager:
      extraArgs:
        bind-address: 0.0.0.0
    scheduler:
      extraArgs:
        bind-address: 0.0.0.0

I understand this isn't a bug directly related to kube-prometheus, but I didn't find this documented anywhere and had been scratching my head for a day looking at this. Hoping this will help someone else in the future.

jeanluclariviere avatar Oct 08 '20 19:10 jeanluclariviere

We do have a patch to disable those alerts and ServiceMonitors in clusters which don't expose access to controller manager and scheduler metrics. But yes, our documentation is lacking in this field.

paulfantom avatar Oct 13 '20 07:10 paulfantom

It would be amazing if you could propose a PR to add what you described in "the workaround" where you expected to find this documentation when you were looking for it! :)

brancz avatar Oct 16 '20 08:10 brancz

yes

dnsmap avatar Nov 17 '20 08:11 dnsmap

@brancz sorry, I've been meaning to get back to this but I keep getting side tracked with other things. Do you think the Troubleshooting section on the main page would be a suitable place for this? If yes, I can put something together linking to either the patch for disabling those checks if on a managed cluster, or for updating kubeadm to use 0.0.0.0 instead of the loopback address for the secure port (at a users discretion of course).

jeanluclariviere avatar Nov 17 '20 19:11 jeanluclariviere

Yes! I think the troubleshooting guide is a great place because that's most likely what people look at when they encounter this.

brancz avatar Nov 26 '20 09:11 brancz

I have changed the bind address, but it did not work. It is strange.

KeithTt avatar Dec 09 '20 10:12 KeithTt

@KeithTt I have encountered the same trouble recently, It took me 5 days to figure it out. My K8S cluster was created via kubeadm and the version is 1.19.2

Check these points:

  1. Edit /etc/kubernetes/manifests/kube-scheduler.yaml, change --bind-address=127.0.0.1 to --bind-address=0.0.0.0
  2. Create a Service and make sure it has a label(for me it's k8s-app: kube-scheduler) matching with the ServiceMonitor(spec.selector.matchLabels)
  3. Make sure the Service matching the right pod with right label (for me, it's component: kube-scheduler)
  4. It took me a long time to find the last one.:triumph: the Service's port name(for me, it's https-metrics) must matching with ServiceMonitor(spec.endpoints.port)
  5. Do the same check for kube-controller-manager

FYI. my yaml files:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    k8s-app: kube-scheduler
  name: kube-scheduler
  namespace: monitoring
spec:
  endpoints:
  - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    interval: 30s
    port: https-metrics
    scheme: https
    tlsConfig:
      insecureSkipVerify: true
  jobLabel: k8s-app
  namespaceSelector:
    matchNames:
    - kube-system
  selector:
    matchLabels:
      k8s-app: kube-scheduler
apiVersion: v1
kind: Service
metadata:
  namespace: kube-system
  name: kube-scheduler
  labels:
    k8s-app: kube-scheduler
spec:
  ports:
  - name: https-metrics
    port: 10259
  selector:
    component: kube-scheduler

johntostring avatar Dec 28 '20 10:12 johntostring

Note, that both kube-scheduler and kube-controller-manager certificates have localhost and 127.0.0.1. This is opposing to etcd ones, which contains actual hostname and node IP. So at this moment, both scheduler and controller-manager require insecureSkipVerify: true

Binding to 0.0.0.0 may have some security implications, like exposing previously hidden endpoints to the cluster and maybe even to the public. insecureSkipVerify is also more of a temporary workaround.

I see two approaches:

  • Design component endpoints with Prometheus scraper in mind: i.e. they provide a certificate that includes master nodes' IPs (similar to etcd). Bind to the main network interface instead of 127.0.0.1. This likely requires support by kubeadm.
  • Delegate scraping to some collector, ran as a DaemonSet. E.g. node-exporter can act as such. It may proxy the request to the node and respond to Prometheus with the result. (Suboptimal, but controllable on the Prometheus side).

ksa-real avatar Jan 19 '21 14:01 ksa-real

Binding to 0.0.0.0 may have some security implications, like exposing previously hidden endpoints to the cluster and maybe even to the public.

I totally agree, which is partially why I haven't made time to submit the PR for updating the troubleshooting docs - that and I'm swamped.

So at this moment, both scheduler and controller-manager require insecureSkipVerify: true

Yea, if what you're saying is true than that would be necessary - I don't recall having to set this value though. Does prometheus even check for valid certificates on the endpoints it scrapes?

Design component endpoints with Prometheus scraper in mind: i.e. they provide a certificate that includes master nodes' IPs (similar to etcd). Bind to the main network interface instead of 127.0.0.1. This likely requires support by kubeadm.

I think this solution is more realistic - I set the bind address to 0.0.0.0 out of laziness, and I suspect most folks are like me (not that this is a good thing!)

jeanluclariviere avatar Jan 19 '21 14:01 jeanluclariviere

So at this moment, both scheduler and controller-manager require insecureSkipVerify: true

Yea, if what you're saying is true than that would be necessary - I don't recall having to set this value though. Does prometheus even check for valid certificates on the endpoints it scrapes?

It is. Scraping fails otherwise.

Design component endpoints with Prometheus scraper in mind: i.e. they provide a certificate that includes master nodes' IPs (similar to etcd). Bind to the main network interface instead of 127.0.0.1. This likely requires support by kubeadm.

I think this solution is more realistic - I set the bind address to 0.0.0.0 out of laziness, and I suspect most folks are like me (not that this is a good thing!)

At the moment I have no idea how to make it. bind-address is passed to scheduler/controller-manager manifests from extraArgs: bind-address: 0.0.0.0. This parameter is static among all nodes. The only "universal" addresses are 127.0.0.1 and 0.0.0.0. Instead, the wanted value is the node IP address. The certificate is generated for localhost/127.0.0.1 regardless of the bind-address. So the right config seems not possible with kubeadm init phase control-plane controller-manager --config kubeadm.yml.

Also, I'm not sure if anything relies on scheduler/control-manager being bounded to 127.0.0.1. Opposing to etcd these can be bound only to a single address AFAIK.

ksa-real avatar Jan 19 '21 15:01 ksa-real

https://github.com/kubernetes/kubeadm/issues/2244 - related issue about kube-scheduler and kube-controller-manager certificates.

ksa-real avatar Jan 19 '21 22:01 ksa-real

it seems that in k8s v1.20.2 (probably even before, i didn't check) the 0.0.0.0 workaround is no longer working, since the default insecure address is already 0.0.0.0 (both for the scheduler and the controller manager) but it is disabled by default, and it looks like the only solution is to use the deprecated port parameter to enable the insecure listening. in short. the workaround should be configured like so:

controllerManager:
  extraArgs:
    port: '10252'
scheduler:
  extraArgs:
    port: '10251'

even though in the documentations they are saying: the default port for the scheduler is 10251 - (Ref: https://kubernetes.io/docs/reference/command-line-tools-reference/kube-scheduler/) the default port for the controller-manager is not specified - (Ref: https://kubernetes.io/docs/reference/command-line-tools-reference/kube-controller-manager/)

both /etc/kubernetes/manifests/kube-scheduler.yaml and /etc/kubernetes/manifests/kube-controller-manager.yaml come with: - --port=0 out of the box which disable the insecure listening.

if someone can approve/deny it i will appreciate it, because i don't like this solution even for a workaround.

omerozery avatar Feb 02 '21 13:02 omerozery

@omerozery I think the default value for --bind-address for both kube-controller-manager and kube-scheduler v1.20 is 0.0.0.0, in which case the issue reported by the OP wouldn't occur. But the point is that kubeadm by default sets these values to 127.0.0.1. So, to revert these settings made by kubeadm, the workaround is

controllerManager:
  extraArgs:
    bind-address: 0.0.0.0
scheduler:
  extraArgs:
    bind-address: 0.0.0.0

as posted by the OP.

Regarding your workaround, kube-controller-manager v1.20 doesn't have a --port flag anymore, so your configuration likely wouldn't work (kube-scheduler v1.20 still has the --port flag). Unless you report that you could indeed enable port 10252 for kube-controller-manager.

weibeld avatar Feb 05 '21 23:02 weibeld

i wrote the comment above after i tried it myself.. i was using the "bind-address" workaround on kubeadm deployed clusters v1.18.2, because like you said it was needed and everything worked fine, (putting aside the security issue). last couple of days i was working on deploying clusters v1.20.2 (using kubeadm) and this configuration didn't make any difference (netstat command below returned nothing and prometheus couldn't scrap the metrics). the only thing that opened the ports and listened using the address 0.0.0.0 is the "ports" workaround. this is the output from my k8s-master when using the "ports" workaround:

[root@my-k8s-master ~]# cat /etc/kubernetes/manifests/kube-scheduler.yaml  | egrep -i '\--bind|\--port'
    - --bind-address=127.0.0.1
    - --port=10251
[root@my-k8s-master ~]# netstat -tunap | grep -i 10251
tcp6       0      0 :::10251                :::*                    LISTEN      12058/kube-schedule 
[root@my-k8s-master ~]# cat /etc/kubernetes/manifests/kube-controller-manager.yaml  | egrep -i '\--bind|\--port'
    - --bind-address=127.0.0.1
    - --port=10252
[root@my-k8s-master ~]# netstat -tunap | grep -i 10252
tcp6       0      0 :::10252                :::*                    LISTEN      12065/kube-controll 

clearly the 127.0.0.1 is ignored , am i missing something?

omerozery avatar Feb 07 '21 11:02 omerozery

@omerozery https://kubernetes.io/docs/reference/command-line-tools-reference/kube-controller-manager/ --port is non-existent, there is only --secure-port now. Default is 10257 for controller-manager, 10259 for the scheduler.

@weibeld Note that --bind-address=0.0.0.0 may expose your metrics to the internet in the production environment. I see two solutions at the moment:

  • Apply firewall rules to drop all connections besides going to the node IP address
  • Patching kubeadm.yml with extraArg: --bind-address=<NODE IP ADDRESS> on every control plane node before doing kubeadm init phase control-plane controller-manager --config kubeadm.yml (same for scheduler).

I made comments in linked kubeadm issue. Generally, IMO kubeadm authors haven't thought it through. The best would be using node IP addresses instead of 127.0.0.1 by default for --bind-address and probes.

https://github.com/kubernetes/kubeadm/issues/2388

ksa-real avatar Feb 07 '21 19:02 ksa-real

see my comment here: https://github.com/kubernetes/kubeadm/issues/2388#issuecomment-774794312

I made comments in linked kubeadm issue. Generally, IMO kubeadm authors haven't thought it through. The best would be using node IP addresses instead of 127.0.0.1 by default for --bind-address and probes.

we've seen the requests about it, but the response has been that we don't want to expose the components outside of localhost due to metrics.

the current best workaround:

  • on each control-plane node, create a bash script that writes the files kube-scheduler.json and kube-controller-manager.json in a folder with the following same contents:
[
	{ "op": "add", "path": "/spec/containers/0/command/-", "value": "--bind-address=SOME_IP" },
	{ "op": "replace", "path": "/spec/containers/0/livenessProbe/httpGet/host", "value": "SOME_IP" }
	{ "op": "replace", "path": "/spec/containers/0/startupProbe/httpGet/host", "value": "SOME_IP" }
]
  • the bash script should put SOME_IP to be the IP you want to bind.
  • call the bash script then call kubeadm init/join/upgrade with --experimental-patches=thepatchfolder.

of course, if there are sufficient votes about this change request let's comment on https://github.com/kubernetes/kubeadm/issues/2388

neolit123 avatar Feb 08 '21 00:02 neolit123

@ksa-real @weibeld. the documents and your comments above (which are basically documents references) do not reflect what is actually happening, for kubeadm v1.20.2 deployed clusters, repeating it won't make it true, please try it before you comment.

omerozery avatar Feb 08 '21 11:02 omerozery

@ksa-real @weibeld. the documents and your comments above (which are basically documents references) do not reflect what is actually happening, for kubeadm v1.20.2 deployed clusters, repeating it won't make it true, please try it before you comment.

I meant don't use the deprecated --port at all. Bind to 0.0.0.0 or node IP and scrape the HTTPS 10257/10259 ignoring the certificate.

netstat -tunap | grep -i 1025[79]

ksa-real avatar Feb 08 '21 21:02 ksa-real

The proxy workaround can be relatively easily implemented by running an HAProxy container with the following configuration as a DaemonSet on each master node:

defaults
  mode http
  timeout connect 5000ms
  timeout client 5000ms
  timeout server 5000ms
  default-server maxconn 10

frontend kube-controller-manager
  bind ${NODE_IP}:10257
  http-request deny if !{ path /metrics }
  default_backend kube-controller-manager
backend kube-controller-manager
  server kube-controller-manager 127.0.0.1:10257 ssl verify none

frontend kube-scheduler
  bind ${NODE_IP}:10259
  http-request deny if !{ path /metrics }
  default_backend kube-scheduler
backend kube-scheduler
  server kube-scheduler 127.0.0.1:10259 ssl verify none

Note the following:

  • The $NODE_IP environment variable (which is the desired IP address that the proxy should listen on) can be passed into the HAProxy Pod with a fieldRef:
    env:
    - name: NODE_IP
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: status.hostIP
    
  • The proxy skips the validation of the TLS server certificate of kube-controller-manager and kube-scheduler (verify none). This is due to the kubeadm defaults setting up kube-controller-manager and kube-scheduler with an unsigned TLS server certificate for serving HTTPS which is saved at an unknown location (see https://github.com/kubernetes/kubeadm/issues/2244 and https://github.com/kubernetes/kubernetes/issues/80063). However, this could be changed by using either --tls-private-key-file and --tls-cert-file or --cert-dir on kube-controller-manager and kube-scheduler, in which case it should be possible to validate the TLS server certificate.
  • The proxy serves only HTTP, however, if HTTPS is really necessary, this could be adapted in the HAProxy configuration (in which case the serving certificate could be freely chosen).
  • The proxy accepts only requests to the /metrics endpoint to not expose any other functionality of the backing services. If necessary, this could be further restricted in the HAProxy configuration by, e.g. only allowing requests from a certain IP address range.
  • The proxy Pods must run in the hostNetwork so that they can access the loopback interfaces of the corresponding kube-controller-manager and kube-scheduler Pods.

After deploying the DaemonSet, you can scrape the metrics of kube-controller-manager and kube-scheduler on http://<NODE_IP>:10257/metrics and http://<NODE_IP>:10259/metrics.

The main disadvantage of this workaround is that the names of the kube-controller-manager and kube-scheduler Pods are lost. Prometheus discovers only the names of the proxy Pods, which is not really useful. Prometheus discovers the names of the nodes the Pods run on (in the __meta_kubernetes_pod_node_name label), so it's at least possible to tell that a given metric belongs to the kube-controller-manager of kube-scheduler of that node, however, there seems to be no easy way to deduce the exact name of this kube-controller-manager or kube-scheduler Pod.

weibeld avatar Feb 10 '21 01:02 weibeld

Thanks! But I think this is‘ot good idea.

VanLiuZhi avatar Feb 24 '21 07:02 VanLiuZhi

We do have a patch to disable those alerts and ServiceMonitors in clusters which don't expose access to controller manager and scheduler metrics. But yes, our documentation is lacking in this field.

Does anyone know where this patch is? Any 1.20 or above cluster, basically none of these workarounds, including HAProxy, ever expose those metrics. Probably best to offer a quick way to disable this entirely until folks make a decision some day about this.

jgerry2002 avatar Jul 07 '21 08:07 jgerry2002

Try setting platform: 'kubeadm' as described in https://github.com/prometheus-operator/kube-prometheus#cluster-creation-tools. Patch will be applied automatically along with optimizations for that platform.

Patch is in https://github.com/prometheus-operator/kube-prometheus/blob/main/jsonnet/kube-prometheus/addons/managed-cluster.libsonnet

paulfantom avatar Jul 07 '21 08:07 paulfantom

@KeithTt I have encountered the same trouble recently, It took me 5 days to figure it out. My K8S cluster was created via kubeadm and the version is 1.19.2

Check these points:

  1. Edit /etc/kubernetes/manifests/kube-scheduler.yaml, change --bind-address=127.0.0.1 to --bind-address=0.0.0.0
  2. Create a Service and make sure it has a label(for me it's k8s-app: kube-scheduler) matching with the ServiceMonitor(spec.selector.matchLabels)
  3. Make sure the Service matching the right pod with right label (for me, it's component: kube-scheduler)
  4. It took me a long time to find the last one.triumph the Service's port name(for me, it's https-metrics) must matching with ServiceMonitor(spec.endpoints.port)
  5. Do the same check for kube-controller-manager

FYI. my yaml files:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    k8s-app: kube-scheduler
  name: kube-scheduler
  namespace: monitoring
spec:
  endpoints:
  - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    interval: 30s
    port: https-metrics
    scheme: https
    tlsConfig:
      insecureSkipVerify: true
  jobLabel: k8s-app
  namespaceSelector:
    matchNames:
    - kube-system
  selector:
    matchLabels:
      k8s-app: kube-scheduler
apiVersion: v1
kind: Service
metadata:
  namespace: kube-system
  name: kube-scheduler
  labels:
    k8s-app: kube-scheduler
spec:
  ports:
  - name: https-metrics
    port: 10259
  selector:
    component: kube-scheduler

@jgerry2002 I have this working, but you need to match the service label to app.kubernetes.io/name: kube-scheduler. But I have another problem now. I have this alert:

name: TargetDown
expr: 100 * (count by(job, namespace, service) (up == 0) / count by(job, namespace, service) (up)) > 10
for: 10m

firing instead now, it is complaining kube-scheduler and kube-controller-manager pod in kube-system is down :laughing:

budimanjojo avatar Jul 07 '21 08:07 budimanjojo

Try setting platform: 'kubeadm' as described in https://github.com/prometheus-operator/kube-prometheus#cluster-creation-tools. Patch will be applied automatically along with optimizations for that platform.

Patch is in https://github.com/prometheus-operator/kube-prometheus/blob/main/jsonnet/kube-prometheus/addons/managed-cluster.libsonnet

So it looks like the solution from the patch is simply not creating any rule for scheduler and controller?

budimanjojo avatar Jul 07 '21 08:07 budimanjojo

So it looks like the solution from the patch is simply not creating any rule for scheduler and controller?

Actually, no. Selecting kubeadm as platform will add 2 Service objects which should allow scraping data from scheduler and controller-manager (both components need to be configured to expose metrics). You can see this in kubeadm.libsonnet which is applied when selecting correct platform: https://github.com/prometheus-operator/kube-prometheus/blob/main/jsonnet/kube-prometheus/platforms/kubeadm.libsonnet.

Disabling and using managed-cluster.libsonnet addon is the last resort for cases when you cannot have access to kube-scheduler nor kube-controller-manager (for example in EKS).

paulfantom avatar Jul 07 '21 09:07 paulfantom

So it looks like the solution from the patch is simply not creating any rule for scheduler and controller?

Actually, no. Selecting kubeadm as platform will add 2 Service objects which should allow scraping data from scheduler and controller-manager (both components need to be configured to expose metrics). You can see this in kubeadm.libsonnet which is applied when selecting correct platform: https://github.com/prometheus-operator/kube-prometheus/blob/main/jsonnet/kube-prometheus/platforms/kubeadm.libsonnet.

Disabling and using managed-cluster.libsonnet addon is the last resort for cases when you cannot have access to kube-scheduler nor kube-controller-manager (for example in EKS).

Thank you for the explanation. I got it now 😊

budimanjojo avatar Jul 07 '21 10:07 budimanjojo

My K8S cluster was created via kubeadm and the version is 1.19.2

Have you tested in 1.20+? I'm also using Tanzu, which is sort of EKS-ish with some of the security and other (attempted) abstraction that is in there. Which is why I figured I'd ask to have an option to disable these checks cleanly and attempt to figure out a way to add that piece of monitoring back in later on.

jgerry2002 avatar Jul 07 '21 11:07 jgerry2002

I just tried to create the manifest for kubeadm platform and I can't find the service manifest anywhere. Am I doing something wrong? This is the modified example.jsonnet which I called using build.sh:

local kp =
  (import 'kube-prometheus/main.libsonnet') +
  // Uncomment the following imports to enable its patches
  // (import 'kube-prometheus/addons/anti-affinity.libsonnet') +
  // (import 'kube-prometheus/addons/managed-cluster.libsonnet') +
  // (import 'kube-prometheus/addons/node-ports.libsonnet') +
  // (import 'kube-prometheus/addons/static-etcd.libsonnet') +
  // (import 'kube-prometheus/addons/custom-metrics.libsonnet') +
  // (import 'kube-prometheus/addons/external-metrics.libsonnet') +
  {
    values+:: {
      common+: {
        namespace: 'monitoring-system',
        platform: 'kubeadm',
      },
    },
  };

{ 'setup/0namespace-namespace': kp.kubePrometheus.namespace } +
{
  ['setup/prometheus-operator-' + name]: kp.prometheusOperator[name]
  for name in std.filter((function(name) name != 'serviceMonitor' && name != 'prometheusRule'), std.objectFields(kp.prometheusOperator))
} +
// serviceMonitor and prometheusRule are separated so that they can be created after the CRDs are ready
{ 'prometheus-operator-serviceMonitor': kp.prometheusOperator.serviceMonitor } +
{ 'prometheus-operator-prometheusRule': kp.prometheusOperator.prometheusRule } +
{ 'kube-prometheus-prometheusRule': kp.kubePrometheus.prometheusRule } +
{ ['alertmanager-' + name]: kp.alertmanager[name] for name in std.objectFields(kp.alertmanager) } +
{ ['blackbox-exporter-' + name]: kp.blackboxExporter[name] for name in std.objectFields(kp.blackboxExporter) } +
{ ['kube-state-metrics-' + name]: kp.kubeStateMetrics[name] for name in std.objectFields(kp.kubeStateMetrics) } +
{ ['kubernetes-' + name]: kp.kubernetesControlPlane[name] for name in std.objectFields(kp.kubernetesControlPlane) }
{ ['node-exporter-' + name]: kp.nodeExporter[name] for name in std.objectFields(kp.nodeExporter) } +
{ ['prometheus-' + name]: kp.prometheus[name] for name in std.objectFields(kp.prometheus) } +
{ ['prometheus-adapter-' + name]: kp.prometheusAdapter[name] for name in std.objectFields(kp.prometheusAdapter) }

I modified the namespace intentionally to see if the namespace does change in the generated manifests, only no service for kube-scheduler and kube-controller-manager manifest found anywhere.

budimanjojo avatar Jul 07 '21 14:07 budimanjojo

Those should be created as files with kubernetes- prefix, possibly kubernetes-kubeControllerManagerPrometheusDiscoveryService.yaml

paulfantom avatar Jul 07 '21 15:07 paulfantom

Those should be created as files with kubernetes- prefix, possibly kubernetes-kubeControllerManagerPrometheusDiscoveryService.yaml

There's no such file created.

╰ ls -la manifests | grep kubernetes
-rw-r--r-- 1 budiman disk   64268 Jul  7 22:08 kubernetes-prometheusRule.yaml
-rw-r--r-- 1 budiman disk    6905 Jul  7 22:08 kubernetes-serviceMonitorApiserver.yaml
-rw-r--r-- 1 budiman disk     447 Jul  7 22:08 kubernetes-serviceMonitorCoreDNS.yaml
-rw-r--r-- 1 budiman disk    6424 Jul  7 22:08 kubernetes-serviceMonitorKubeControllerManager.yaml
-rw-r--r-- 1 budiman disk    7240 Jul  7 22:08 kubernetes-serviceMonitorKubelet.yaml
-rw-r--r-- 1 budiman disk     537 Jul  7 22:08 kubernetes-serviceMonitorKubeScheduler.yaml

I also tried changing it to: platforms: 'kubeadm', platform: 'kubespray', platforms: 'kubespray' Nothing work. I don't know why it doesn't work T.T

budimanjojo avatar Jul 07 '21 15:07 budimanjojo