gpu-operator [dcgm-exporter] Support exposing metrics on hostNetwork

Add support to toggle hostNetwork field of pod spec for dcgm-exporter daemonset pod spec, allowing dcgm-exporter pods to be scraped by say prometheus-server that runs outside of the bounds of the k8s cluster overlay network (and still be able to reach dcgm-exporter pods, i.e. scraping each daemonset pod's port on the node - this is easier to reason and deal with compared to say scraping a NodePort service ports on each node).

Fixes #1086

Dec 03 '25 07:12 nikhaild

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Dec 03 '25 07:12 copy-pr-bot[bot]

/ok-to-test d858d245bf511e48b936ed04d3626e6bc4e8f858

Dec 05 '25 23:12 rajathagasthya

@nikhaild, thanks for your contribution. In general, we like to ensure that our generated YAML manifests are aligned with the upstream helm chart of dcgm-exporter. It would be good if you can open a PR/issue in NVIDIA/dcgm-exporter and get the maintainers of dcgm-exporter to weigh in on this first.

Dec 08 '25 20:12 tariq1890

In general, we like to ensure that our generated YAML manifests are aligned with the upstream helm chart of dcgm-exporter. It would be good if you can open a PR/issue in NVIDIA/dcgm-exporter and get the maintainers of dcgm-exporter to weigh in on this first.

Thanks @tariq1890 for taking a look!

Upstream dcgm-exporter helm chart already supports "templatizing" this hostNetwork field (was done a while ago, tracker PR#64), code ref

      {{- if .Values.hostNetwork }}
      hostNetwork: {{ .Values.hostNetwork }}

It's just not set explicitly to a default value in dcgm-exporter helm chart values.yaml, but looks like some folks use it already (ref issue#495).

Mind clarifying what exactly should I ask from dcgm-exporter maintainer folks? Would appreciate if you could elaborate a bit.

Dec 10 '25 22:12 nikhaild