kepler icon indicating copy to clipboard operation
kepler copied to clipboard

Kind returns same value for all Pods in a namespace

Open nikimanoledaki opened this issue 2 years ago • 8 comments

Describe the bug All Pods return the same value in a kind cluster.

To Reproduce On a MacOS, I created a Vagrant VM with VirtualBox as a driver to run Ubuntu 2204. I ensured that the VM meets all the Kepler requirements (kernel headers). Then, I created a kind cluster and bootstrapped Flux on it.

Below are the metrics for the Pods that I wanted to measure. They are the Pods containing the Flux controllers, all in the flux-system namespace.

The issue is that they all have the same value, which shouldn't be the case:

curl -G http://localhost:9090/api/v1/query --data-urlencode "query=pod_curr_energy_in_pkg_millijoule{pod_namespace='flux-system'}" | jq
{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {
          "__name__": "pod_curr_energy_in_pkg_millijoule",
          "command": "helm-contr",
          "container": "kepler-exporter",
          "endpoint": "http",
          "instance": "kind-control-plane",
          "job": "kepler-exporter",
          "namespace": "kepler",
          "pod": "kepler-exporter-7nbzs",
          "pod_name": "helm-controller-9b6bb4f68-k2vp7",
          "pod_namespace": "flux-system",
          "service": "kepler-exporter"
        },
        "value": [
          1664036406.998,
          "1"
        ]
      },
      {
        "metric": {
          "__name__": "pod_curr_energy_in_pkg_millijoule",
          "command": "kustomize-",
          "container": "kepler-exporter",
          "endpoint": "http",
          "instance": "kind-control-plane",
          "job": "kepler-exporter",
          "namespace": "kepler",
          "pod": "kepler-exporter-7nbzs",
          "pod_name": "kustomize-controller-7f4687b878-65q97",
          "pod_namespace": "flux-system",
          "service": "kepler-exporter"
        },
        "value": [
          1664036406.998,
          "1"
        ]
      },
      {
        "metric": {
          "__name__": "pod_curr_energy_in_pkg_millijoule",
          "command": "source-con",
          "container": "kepler-exporter",
          "endpoint": "http",
          "instance": "kind-control-plane",
          "job": "kepler-exporter",
          "namespace": "kepler",
          "pod": "kepler-exporter-7nbzs",
          "pod_name": "source-controller-f8d655bdc-4fw6n",
          "pod_namespace": "flux-system",
          "service": "kepler-exporter"
        },
        "value": [
          1664036406.998,
          "1"
        ]
      },
      {
        "metric": {
          "__name__": "pod_curr_energy_in_pkg_millijoule",
          "command": "tini",
          "container": "kepler-exporter",
          "endpoint": "http",
          "instance": "kind-control-plane",
          "job": "kepler-exporter",
          "namespace": "kepler",
          "pod": "kepler-exporter-7nbzs",
          "pod_name": "notification-controller-7f5dbddc94-rtdhq",
          "pod_namespace": "flux-system",
          "service": "kepler-exporter"
        },
        "value": [
          1664036406.998,
          "1"
        ]
      }
    ]
  }
}

Expected behavior I expected to have more granular data according to the activity of each Pod.

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: MacOS with a Vagrant VirtualBox VM (Ubuntu 2204)

Additional context This may very well be an issue due to running a kind cluster inside of a VM!

nikimanoledaki avatar Sep 26 '22 15:09 nikimanoledaki

@nikimanoledaki thank you for reporting this issue! Are you running on mac with ARM or x86 CPUs?

rootfs avatar Sep 26 '22 15:09 rootfs

would you please get the head of the kepler pod log?

kubectl logs -n kepler daemonset/kepler-exporter |head -100

rootfs avatar Sep 26 '22 15:09 rootfs

I am guessing you are seeing similar issue to me https://github.com/sustainable-computing-io/kepler/issues/211#issuecomment-1253068685 , looks like the kind on a VM might need more work

jichenjc avatar Sep 27 '22 01:09 jichenjc

Not sure how macOS handles kind. On my setup (KVM guest on RHEL), only one pod is reported

[root@kind-control-plane /]# curl http://10.96.206.34:9102/metrics |grep pod_total_energy_millijoule 
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 15976    0 15976    0     0  15.2M      0 --:--:-- --:--:-- --:--:-- 15.2M
# HELP pod_total_energy_millijoule pod_ total energy consumption (millijoule)
# TYPE pod_total_energy_millijoule counter
pod_total_energy_millijoule{command="docker-pro",pod_name="system_processes",pod_namespace="system"} 964
pod_total_energy_millijoule{command="irqbalance",pod_name="kube-scheduler-kind-control-plane",pod_namespace="kube-system"} 4

rootfs avatar Sep 27 '22 01:09 rootfs

On Ubuntu it's same (a Kind on KVM VM) , and the command="cinder-sch" is really weird as well (guess that's first pod it met)

root@kind-control-plane:/# curl localhost:9102/metrics |grep pod_total_energy_millijoule
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 13445    0 13445   # HELP pod_total_energy_millijoule pod_ total energy consumption (millijoule)
 # TYPE pod_total_energy_millijoule counter
0pod_total_energy_millijoule{command="cinder-sch",pod_name="system_processes",pod_namespace="system"} 238
     0  4376k      0 --:--:-- --:--:-- --:--:-- 4376k

jichenjc avatar Sep 27 '22 02:09 jichenjc

I suspect BM => VM => kind is valid (at least for now) use case ,as both k8s and openshift doesn't have production env running like this model per https://github.com/sustainable-computing-io/kepler/issues/211#issuecomment-1252299478, I think

if we agree it's not a valid use case (production env) ,maybe we need doc somewhere

jichenjc avatar Sep 28 '22 01:09 jichenjc

So, just to double check, you are running inside a VM right?

marceloamaral avatar Sep 28 '22 06:09 marceloamaral

So, just to double check, you are running inside a VM right?

yes, BM => VM => kind, not sure @nikimanoledaki is or not

jichenjc avatar Sep 28 '22 06:09 jichenjc

Hey all, sorry for the very late reply. Yes, I was running kind inside a VM:

MacOs -> VirtualBox VM running Ubuntu 2204 -> Kind

I ensured that the VM had kernel headers.

Maybe it was not valid because MacOs was used as host. However, this was an attempt to find a working dev env for people who would like tro try Kepler using MacOs and don't have access to any baremetal machine.

This could potentially work if the CPU Arch is not discovered and instead is overridden with an env var as introduced by this PR by @rootfs: https://github.com/sustainable-computing-io/kepler/pull/278

WDYT? I haven't tested it - does anyone else have MacOs and would like to try? I could make time to try in the next few weeks

nikimanoledaki avatar Nov 11 '22 15:11 nikimanoledaki

@nikimanoledaki this might be related to the issue #388 to enable the model server.

But what is happening is that since it's a VM it doesn't normally have power metrics so it's using the power estimation for the node energy consumption. Also, since the VM probably does not have hardware counters, it is not calculating the resource usage rate to determine power consumption per container. So, it's probably dividing the host power consumption evenly across all containers. This is why we are seeing the same power consumption for all containers.

To get a better estimating on VMs you need to use the estimator and/or model server….See #388

marceloamaral avatar Nov 16 '22 09:11 marceloamaral

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar May 17 '23 13:05 stale[bot]

@nikimanoledaki did you manage to collect metrics?

marceloamaral avatar May 18 '23 05:05 marceloamaral

I recently tried creating a similar setup on macOS + VirtualBox but unfortunately VirtualBox is no longer supported on the macOS Ventura, which is the latest version: https://github.com/kubernetes/minikube/issues/15274 Since the setup environment is not supported at the moment or in the foreseeable future, it should be ok to close this issue. I have not managed to make Kepler work in a VM on MacOS in any other way so I use a Linux machine instead when I have to run Kepler, so I am not blocked either.

nikimanoledaki avatar May 18 '23 12:05 nikimanoledaki