[dcgm-exporter] Support exposing metrics on hostNetwork
Add support to toggle hostNetwork field of pod spec for dcgm-exporter daemonset pod spec,
allowing dcgm-exporter pods to be scraped by say prometheus-server that runs outside of the
bounds of the k8s cluster overlay network (and still be able to reach dcgm-exporter pods,
i.e. scraping each daemonset pod's port on the node - this is easier to reason and deal with
compared to say scraping a NodePort service ports on each node).
Fixes #1086
This pull request requires additional validation before any workflows can run on NVIDIA's runners.
Pull request vetters can view their responsibilities here.
Contributors can view more details about this message here.
/ok-to-test d858d245bf511e48b936ed04d3626e6bc4e8f858
@nikhaild, thanks for your contribution. In general, we like to ensure that our generated YAML manifests are aligned with the upstream helm chart of dcgm-exporter. It would be good if you can open a PR/issue in NVIDIA/dcgm-exporter and get the maintainers of dcgm-exporter to weigh in on this first.
In general, we like to ensure that our generated YAML manifests are aligned with the upstream helm chart of dcgm-exporter. It would be good if you can open a PR/issue in NVIDIA/dcgm-exporter and get the maintainers of dcgm-exporter to weigh in on this first.
Thanks @tariq1890 for taking a look!
Upstream dcgm-exporter helm chart already supports "templatizing" this hostNetwork field (was done a while ago, tracker PR#64), code ref
{{- if .Values.hostNetwork }}
hostNetwork: {{ .Values.hostNetwork }}
It's just not set explicitly to a default value in dcgm-exporter helm chart values.yaml, but looks like some folks use it already (ref issue#495).
Mind clarifying what exactly should I ask from dcgm-exporter maintainer folks? Would appreciate if you could elaborate a bit.