telegraf
telegraf copied to clipboard
feat: Unify graphics SMI implementations
Required for all PRs:
- [ ] Updated associated README.md.
- [X] Wrote appropriate unit tests.
- [X] Pull request title or commits are in conventional commit format (e.g. feat: or fix:)
Both SMI implementations, nvidia_smi
as well as amd_rocm_smi
, share a common code structure: Try first execute a binary that produces the data and then parse the data into metrics using a kind of mapping definition. The present PR unifies this common structure and merges both plugins into a new generic ReceiveAndParse
plugin. This new plugin provides a common infrastructure allowing to split the processing into a transport
which receives the raw data and a parser
transforming this raw data to metrics. Additional post-processing capabilities are provided to allow further metric transformation impossible in the parser. The whole change by this PR is transparent to the user.
To generalize more, a common/transport
package is introduced that abstracts the way of receiving the data. For now only an Exec
receiver is provided sufficient to unify the two graphics plugins. However, in the future we might want to extend this to e.g. HTTP transports or similar.
For the parsing part, we uses telegraf's plugin power to instantiate a suitable parser. We configure this parser with a config file embedded into the telegraf binary.
Please note that more such common structures exist in telegraf e.g. for web-plugins that do a HTTP request and then parse the response (usually JSON) to metrics. Those plugins might also be unified in a similar way.
Hi Sven, I like how this PR lets the two plugins become configuration instead of code. (in NewAMDSMI()
and NewNvidiaSMI()
)
What do you think about making the graphics_smi class more generic so it can be used by plugins that aren't gpu related, and then putting it in plugins/common? Then adm_rocm_smi and nvidia_smi would stay in their separate directories but use the common code.
I would prefer to keep telegraf's convention of each input plugin having its own directory in plugins/inputs, where users can find the readme.md, the plugin implementation and unit tests, and shared code in plugins/common.
Looks like new artifacts were built from this PR. Get them here!
Artifact URLs
List of plugins that might also benefit from such makeover.
:package: Looks like new artifacts were built from this PR.
Expand this list to get them here! :tiger:
Artifact URLs
Can we finalise this one?
Can we finalise this one?
I'm happy to merge once we revert the toml.Unmarshal changes and switch back to static struct initialization.
@srebhan Do you know it that has been done already?
next steps: @srebhan to look into reverting toml.Unmarshal changes and switch back to static struct initialization
This only pays off if we convert more plugins to the infrastructure introduced in this PR. As time currently does not permit this, I will close the PR for now. If anyone is willing to work on this (and convert more plugins) please feel free to reopen or ping me.
I would be able to migrate some if the framework is already there.. Now there is nothing..
@Hipska as I wrote myself does not have time to convert them, but I can finish up this PR if you commit to convert others. ;-)