windows_exporter icon indicating copy to clipboard operation
windows_exporter copied to clipboard

Container names or labels

Open usego opened this issue 5 years ago • 14 comments

I have number of Windows servers running different type of containers. Any chance to get container names and/or labels to simplify identifying of containers? At this moment only container_id are in metrics, which is hard to consume in prometheus / graphana.

usego avatar Jul 11 '19 19:07 usego

Hi @usego, It looks like this should be possible, it appears we have all these fields available: ContainerProperties spec. I currently don't have any system with Windows containers running, so I don't have a good way to test this, though. If you'd like to contribute a PR, we'd gladly review it!

carlpett avatar Jul 11 '19 19:07 carlpett

Unfortunately, I'm not seeing anything useful in the ContainerProperties spec: image

Drewster727 avatar Jul 21 '19 01:07 Drewster727

@carlpett welp, per: https://github.com/microsoft/hcsshim/issues/652

Looks like hcsshim won't provide what is needed. Is there another common library we can utilize to integrate? -Drew

Drewster727 avatar Aug 02 '19 22:08 Drewster727

Yep, we need to go about it differently for sure. I'm somewhat concerned about how to support this well, given that Windows aims to support multiple "container platforms" (that is, not just Docker). CRI would probably be pretty similar to Docker support, but the more Hyper-V-like containers might not have the same concepts to map to. Perhaps it would be easiest to get right if we do not put the labels on every metric, but rather have a container_<type>_info metric which can be used for lookups? Something like wmi_container_docker_info{container_id="...", name="...", ...}. This way we wouldn't run into issues if the different types don't have the same set of information. The queries will be a bit more complicated, of course, you'd need to do something like wmi_container_cpu_usage_seconds_total * on(container_id) wmi_container_docker_info{name="my-container"} etc. What do you think?

carlpett avatar Aug 03 '19 09:08 carlpett

Can I use github.com/docker/docker/client to get container name then save to class instead microsoft/hcsshim Container_name?

wyaopeng avatar May 26 '20 15:05 wyaopeng

It is a fairly heavy package to import from a dependency point of view, so I'd rather not. How about generating the info metric I suggested above using a shell script and the textfile collector?

carlpett avatar May 28 '20 19:05 carlpett

We ended up using Telegraf to get the Container Metrics from the Docker Daemon, maybe we can look at leveraging it the same way in this project? Can also map container_id with some regex...

https://github.com/influxdata/telegraf/blob/master/plugins/inputs/docker/docker.go

Smuggla avatar Jun 15 '20 21:06 Smuggla

Hey folks, not to kick a dead horse here, but isn't this exporter useless without container names/labels? Is there something I don't understand which can magically translate container_id="containerd://032934897e303963da4402cc5510da0736f12ff6a56f1f77dd83845cd22d62be objects into something meaningful for us to leverage these metrics? Alertmanager, and Grafana are both useless without context of what this container actually is.

I understand maybe the problem is upstream if I'm reading some of the above correctly since hcsshim doesn't provide us this data. What's the fix here? We'd like to use Windows Containers in production, but basically without this issue resolved, as far as I can tell, metrics are (largely) useless and would require a lot of manual work to look up container IDs from running service, and from past services would be impossible.

To me, it seems like a simple fix when running this in a container via a Daemonset would be to allow this service an RBAC role which allows it to query the kubectl API and do a lookup to map the containerd IDs to pods details (namespace, name, etc).

AndrewFarley avatar Sep 17 '23 23:09 AndrewFarley

@AndrewFarley the "barely making it useful" workaround I'm using is this set of promethes rules https://github.com/kubernetes-monitoring/kubernetes-mixin/blob/master/rules/windows.libsonnet#L178 which adds some metrics with proper pod/container labels. I don't know how the standard node-exporter does this but the lack of these labels is really painful.

PatTheSilent avatar Sep 18 '23 08:09 PatTheSilent

Hey folks, we're running standalone Windows Containers on top of Windows Servers without kubernetes, it's also useless for us to enable windows_exporter on the server to monitor windows containers without container name, how could we get the container name label from the metrics in that case? It seems that the only solution is to leverage github.com/docker/docker/client API, correct? But it was pointed out that the package is too heavy from above threads, right?

godocean avatar Sep 18 '23 14:09 godocean

To me, the concept of a library being "heavy" is irrelevant if this package is useless (for the purpose of monitoring Windows nodes which exclusively host Windows Containers) without it.

The container collector to me is currently feature-incomplete and should be marked as an "beta" or add Docker name support. I see no other option, as using this exporter is effectively useless for Containers and Kubernetes at the moment, which (I would argue) is probably one of (if not THE) main use-case for this exporter.

If it is really heavy, to the point where it absolutely will NEVER be allowed to be in this executable by default, is it possible to make this pluggable? So that a dynamic library for "container" support can be added only if needed? It'll really make it (more) painful to use, but will satisfy the requirement of being able to be functional in a containerized environment. Or alternatively, make a "secondary/alternate" build of this with the "heavy" dependency so that it can function.

Like, does anyone in the world have Windows Kubernetes nodes in production that are being monitored, or do we all just pray things don't break? Am I missing something, seems like something really simple and silly to be overlooked here.

AndrewFarley avatar Sep 19 '23 05:09 AndrewFarley

Following @PatTheSilent 's recommendation and work we were able to get something working and do so in a way that is backwards compatible and is based on the metrics which show up from windows-exporter-daemonset.yaml. We are running AWS EKS 1.26 and have used the Helm Chart to run Prometheus.

Our Helm Chart values file for Prometheus itself (related to this topic) has the following...


serverFiles:
  recording_rules.yml:
    groups:
      # This adds labels for the windows container name and pod name into our windows metrics which don't normally have it
      # Rules (loosely) below on https://github.com/kubernetes-monitoring/kubernetes-mixin/blob/master/rules/windows.libsonnet#L178
      # Issue: https://github.com/prometheus-community/windows_exporter/issues/357
      - name: windows.pod.rules
        rules:
          # CPU USAGE MAP OVER EXISTING LINUX METRIC NAMES SO EXISTING DASHBOARDS WORKS
          - record: container_cpu_usage_seconds_total
            expr: windows_container_cpu_usage_seconds_total{service="windows-exporter", container_id != ""} * on(container_id) group_left(container, pod, namespace, image) max(kube_pod_container_info{job="kubernetes-pods", container_id != ""}) by(container, container_id, pod, namespace, image)
          # MEMORY USAGE MAP OVER EXISTING LINUX METRIC NAMES
          - record: container_memory_working_set_bytes
            expr: windows_container_memory_usage_commit_bytes{service="windows-exporter", container_id != ""} * on(container_id) group_left(container, pod, namespace, image) max(kube_pod_container_info{job="kubernetes-pods", container_id != ""}) by(container, container_id, pod, namespace, image)
          # NETWORK BANDWIDTH USAGE MAP OVER EXISTING LINUX METRIC NAMES
          # ---> RECEIVE...
          - record: container_network_receive_bytes_total
            expr: windows_container_network_receive_bytes_total{service="windows-exporter", container_id != ""} * on(container_id) group_left(container, pod, namespace, image) max(kube_pod_container_info{job="kubernetes-pods", container_id != ""}) by(container, container_id, pod, namespace, image)
          # ---> TRANSMIT/SEND...
          - record: container_network_transmit_bytes_total
            expr: windows_container_network_transmit_bytes_total{service="windows-exporter", container_id != ""} * on(container_id) group_left(container, pod, namespace, image) max(kube_pod_container_info{job="kubernetes-pods", container_id != ""}) by(container, container_id, pod, namespace, image)

          # Unused-ish?  But here anyways just incase...
          - record: windows_container_available_with_labels
            expr: windows_container_available{service="windows-exporter", container_id != ""} * on(container_id) group_left(container, pod, namespace, image) max(kube_pod_container_info{job="kubernetes-pods", container_id != ""}) by(container, container_id, pod, namespace, image)
          - record: windows_container_memory_usage_private_working_set_bytes_with_labels
            expr: windows_container_memory_usage_private_working_set_bytes{service="windows-exporter", container_id != ""} * on(container_id) group_left(container, pod, namespace, image) max(kube_pod_container_info{job="kubernetes-pods", container_id != ""}) by(container, container_id, pod, namespace, image)


And here's proof/example of it working with the Deployment/Daemonset/Statefulset dashboard, this is a screenshot of a service which runs entirely on Windows, and this dashboard works along-side any services which are Linux-based transparently. I hope this helps save someone else the hours/days I've spent on making this work!

Screen Shot 2023-09-25 at 3 14 17 PM

AndrewFarley avatar Sep 25 '23 02:09 AndrewFarley

@AndrewFarley would you be interested in adding your recording rules and example queries in the container collector documentation? It'd be a big help to others using the collector.

breed808 avatar Nov 17 '23 19:11 breed808

This issue has been marked as stale because it has been open for 90 days with no activity. This thread will be automatically closed in 30 days if no further activity occurs.

github-actions[bot] avatar Feb 16 '24 02:02 github-actions[bot]

Sorry @breed808 yeah you're welcome to add the above to the docs. :) Works great for us

AndrewFarley avatar Mar 17 '24 23:03 AndrewFarley