glowkey
glowkey
Gathering additional DCP metrics for all the MIG devices requires more queries.
You should be able to remove the DCP metrics from the watched metrics list to reduce the load, see https://docs.nvidia.com/datacenter/dcgm/latest/user-guide/feature-overview.html#profiling-metrics and https://github.com/NVIDIA/dcgm-exporter/blob/main/etc/default-counters.csv#L81
I am inclined towards integrating this PR over #350. This PR seems to allow for the greatest flexibility going forward while not changing the default behavior.
I liked the ability to create and deploy a custom list of metrics all at once and from within a single values file. How would you envision users doing that...
Thanks for the additional clarification, makes sense and seems like a valid approach to solving this problem.
> But what about simplifying this by externalizing the whole default metrics config file +1 I think that's a good extension to this approach.
@chipzoller would you be willing to combine this PR with #350 and include the full `dcp-metrics-included.csv `contents in the `customMetrics` section (commented out) in a new PR that we'll look...
Absolutely, thank you for the contribution!
Yes, thanks for following up. We plan on running it through tests and merging this week.
Feel free, though their deployment is slightly different. Also, can you [sign your commit](https://docs.github.com/en/authentication/managing-commit-signature-verification/signing-commits)?