telegraf icon indicating copy to clipboard operation
telegraf copied to clipboard

docs(specs): Labels and Selectors for Plugin Enablement

Open neelayu opened this issue 8 months ago • 5 comments

Summary

Provides a spec for Plugin enable and disablement using labels and selectors.

Checklist

  • [x] No AI generated code was used in this PR

Related issues

resolves NA related #16704

neelayu avatar Apr 26 '25 13:04 neelayu

Hi @srebhan thanks for reviewing. I have addressed some of the comments- I think you got labels and selectors mixed up. Labels will be simply a list, and selectors will be following logical operations depending on the input. Accordingly I have made the changes. Let me know.

neelayu avatar May 30 '25 11:05 neelayu

@srebhan

Furthermore, I do have a question: With the current description how do you select (app=web AND region=us-east) OR (app=backend AND region=eu-central) with the --label flags? I think you need to support lists also on the command-line flag to do this.

This would mean we pass

selectors = ["app=web,region=us-east", "app=backend,region=eu-central"]

To match it against the labels, we require a list of labels and it is documented in this spec. We would pass one or more --label flags

--label="app=web" --label="region=us-east" --label="another=value"

This would match for the first selector. This case is present in the 5th example of the table.

neelayu avatar Jun 06 '25 09:06 neelayu

@neelayu I think

This would match for the first selector. This case is present in the 5th example of the table.

is not what I mean. All of your examples show an OR combination, i.e. as soon as at least on label provided on the command line matches the plugin is selected.

What I mean is that if you allow the selectors to be individual entries like

[[inputs.foo]]
  selectors = ["app=web", "region=us-east"]

[[inputs.foo]]
  selectors = ["app=web", "region=us-west"]

[[inputs.foo]]
  selectors = ["app=web", "region=eu-central"]

[[inputs.foo]]
  selectors = ["app=backend", "region=us-east"]

[[inputs.foo]]
  selectors = ["app=backend", "region=us-west"]

[[inputs.foo]]
  selectors = ["app=backend", "region=eu-central"]

You have more rich ways of selecting groups like --label "app=web" or --label="region=us-*" at command line and if you allow for --label="app=web,region=us-east" you can enforce matching a subset. That's much easier than doing it the other way around, isn't it?

srebhan avatar Jun 10 '25 16:06 srebhan

@srebhan Labels are supposed to be simple key value pairs(atomic), without wildcards. You can have multiple of them, but eventually the selectors are the ones to be matched against them.

neelayu avatar Jun 16 '25 11:06 neelayu

Generally I find the terms selectors and labels a bit confusing. For me a user adds certain labels to plugins telling the system what those plugins are about (kind of metadata). Then the operator wants to select a subset of those plugins by providing which labels to select. Does that make sense?

You kind of invert the logic by "labeling" the Telegraf instance and the plugins are "selecting" if they want to be part of that. I think that's weird... Furthermore, your system requires that the person writing the config foresees all possible combinations which is difficult if the person writing the config and the one operating Telegraf are not the same...

srebhan avatar Jun 17 '25 12:06 srebhan

@srebhan I see your point.

Can we get @Hipska's thoughts as well? @Hipska seemed okay with this spec. Just curious if the labels and selector terms are making sense or if something’s unclear.

neelayu avatar Jun 18 '25 07:06 neelayu

I'm just another guy from the community, just like You..

I was okay with the original spec indeed. However, I also agree with what @srebhan said. For me, both ways would work.

Note: I don't actually need this functionality as I generate the config for each system. Hint: You can use a generic telegraf.conf and have system specific configs in telegraf.d/.

Hipska avatar Jun 18 '25 07:06 Hipska

Thanks @Hipska I just wanted to clarify whether the terminology might be causing any confusion. In the Kubernetes world, the typical phrasing is "Select all objects that match the labels." This is because labels are considered atomic key-value pairs, meaning they don't support wildcards.

@srebhan Let's consider the following case- there are two operators responsible for running telegraf instances. They will run these instances with say the following commands

Telegraf-01
---
telegraf --config-directory configs/ --label="env=prod" --label="cpu=true" --label="instance=us-west"
Telegraf-02
telegraf --config-directory configs/ --label="env=prod" --label="mem=true"

Now there could be many individuals who write their configs accordingly

[[inputs.cpu]]
# The first selector does not match with any label, but second one mandates env=prod AND cpu=true
# Hence Telegraf-01 will "select" this plugin
selector = ["instance=eu-*", "env=prod,cpu=true"]
[[inputs.mem]]
# The selector matches with Telegraf-02 and hence it will be selected on that instance
selector = ["env=prod,mem=true"]

Now the downside could be if there is the following config

[[inputs.mem]]
selector = ["instance=us-*"]

This will match Telegraf-01 and hence will be selected. Essentially, if there is a disjoint between operators and config writers, then such cases can arise even when we invert the terminolgy. But I feel operators will run these instances with appropriate labels only to ensure config writers know where the plugin is to be selected.

If the confusion is wrt to labels and selector terminology only, then I can definitely rework the definitions.

The solution would be to ensure we thoroughly document the behaviour

Operator Responsibility: Operators must ensure labels are specific and accurate to avoid potential overlaps or unintended plugin activation.

Config Writer Responsibility: Config writers need to be mindful of selectors and avoid overly broad patterns that could match unintended instances.

neelayu avatar Jun 18 '25 08:06 neelayu

The more I read your use cases, the more I feel like this is already possible with telegraf (just other terminology)

According to your latest scenario, you could have this:

Telegraf-01
---
telegraf --config-directory configs/ --config prod/cpu.conf --config us/mem.conf
---
Telegraf-02
---
telegraf --config-directory configs/ --config prod/mem.conf

Where the configs/ directory holds only the generic configs.

The situation is exactly the same:

  • Config Writer Responsibility:
    • Config writers need to be mindful of config filenames.
  • Operator Responsibility:
    • Operators must ensure to use the correct configs.

Hipska avatar Jun 18 '25 08:06 Hipska

That is true. But this scenario works if you want those plugins to be enabled. For disabling it, you will either comment the block out or remove it completely. Of course, one would need a watch to ensure this happens without restarting the telegraf.

As mentioned in the original issue, I am looking for a more robust way to handle it.

neelayu avatar Jun 18 '25 09:06 neelayu

For disabling it, it is just matter of not including that specific config file. Which is indeed the same effect as commenting or removing the block. Seems like the most robust as it can get, don't you think?

Hipska avatar Jun 18 '25 09:06 Hipska

@neelayu let me take your example above and convert it to my terminology also being consistent with

"Select all objects that match the labels."

Telegraf-01
---
telegraf --config-directory configs/ --select="env=prod,cpu=true,instance=eu-central"
Telegraf-02
---
telegraf --config-directory configs/ --select="env=prod,mem=true"

Now there could be many individuals who write their configs accordingly

[[inputs.cpu]]
# Telegraf-01 will "select" this plugin but it would also be selected with
# --select "env=prod", "cpu=true" i.e. without specifying an instance...
labels = ["instance=eu-central", "env=prod", "cpu=true"]
[[inputs.mem]]
# The selector matches with Telegraf-02 and hence it will be selected on that instance
labels = ["env=prod", "mem=true"]

The benefit is that you could select both plugins with

telegraf --config-directory configs/ --select="env=prod"

or if you imagine a third plugin you want to select with

[[inputs.disk]]
labels = ["env=staging", "disk=true"]

you can select three by

telegraf --config-directory configs/ --select="env=prod" --select "env=staging,disk=true"

meaning "select everything with env being prod OR env being staging AND disk being true"

excluding others that do not match. It allows for a more rich selection syntax and, given there is a mutually agreed-on label set, the writer of the configs just need to label the plugins the right way without needing to know how they later might be combined.

srebhan avatar Jun 18 '25 09:06 srebhan

@Hipska There are three options that I see for disabling

  • Remove the config file. There could be one plugin per file or multiple plugins. If it is latter, we might end up removing something that we need.
  • Remove the plugin definition.
  • Comment the plugin definition.

In all cases, if telegraf is running with watch enabled, then it will detect this change and reload automatically. Otherwise manual restart is required. My proposal is for those cases where telegraf is running with watch enabled. So we consider 2nd and 3rd cases. Both seem cumbersome, we may end up with invalid syntax or something equivalent.

However, I understand the point and we have been doing it in our use-case. We end up commenting out plugins. This proposal is just to make it more extensible and flexible.

neelayu avatar Jun 18 '25 09:06 neelayu

Did you actually see my response? It is about adding/removing the --config option from the command line. Like you would also add or remove --label from it. It doesn't require/need touching the files at all..

Hipska avatar Jun 18 '25 11:06 Hipska

Not really. I would change the selector in the plugin to ensure it doesn't match the label and hence get dropped by telegraf without having to stop the running instance of telegraf

neelayu avatar Jun 18 '25 11:06 neelayu

@neelayu I've done some selection stuff by having one directory with all the config files, one per plugin or "gather-group". Additionally I did have a config file above that directory with the global settings like agent etc. Now the selection happened by creating another directory with symlinks to the configs I want and Telegraf would be given the directory with the symlinks. Now I can select/deselect with creating or removing symlinks.

In any case I see value in this approach as it adds some flexibility by being able to selecting "configuration sets". So I'm not opposed to the idea I just want to make it as generic as possible as experience tells that people will ask for logical combinations. :-)

srebhan avatar Jun 27 '25 09:06 srebhan

@neelayu how do we want to proceed?

srebhan avatar Jul 01 '25 12:07 srebhan

I am not sure where this is going honestly. 😅

But I think all we need to do is invert the definitions of labels and selectors. Logical operators will continue to work as defined.

neelayu avatar Jul 01 '25 13:07 neelayu

I think the difference between my suggestion and the current version is that the logical operations happen on the command-line in my suggestion whereas in your version they are done in the config. I really would prefer to move the actual selection to the command-line as this provides more flexibility IMO...

srebhan avatar Jul 04 '25 09:07 srebhan

Yes that is correct. Let me work through it.

neelayu avatar Jul 04 '25 10:07 neelayu

@srebhan I have made the changes, can you review it again? thanks!

neelayu avatar Jul 12 '25 09:07 neelayu

Wouldn't it be better to have the TOML markup be like this?

[inputs.cpu.labels]
  app = "payments"
  region = "us-east"
  env = "prod"

Just as you have for tags..

Hipska avatar Jul 14 '25 13:07 Hipska

@Hipska My initial thinking was always a map, but I felt labels should generally be at the top of the plugin definition, I hesitated and went with array of strings. However, map ensures that we have unique keys, so I have made the change.

@srebhan

neelayu avatar Jul 14 '25 17:07 neelayu

Download PR build artifacts for linux_amd64.tar.gz, darwin_arm64.tar.gz, and windows_amd64.zip. Downloads for additional architectures and packages are available below.

:thumbsup: This pull request doesn't change the Telegraf binary size

:package: Click here to get additional PR build artifacts

Artifact URLs

. DEB . RPM . TAR . GZ . ZIP
[[amd64.deb aarch64.rpm darwin_amd64.tar.gz windows_amd64.zip] [arm64.deb armel.rpm darwin_arm64.tar.gz windows_arm64.zip] [armel.deb armv6hl.rpm freebsd_amd64.tar.gz windows_i386.zip] [armhf.deb i386.rpm freebsd_armv7.tar.gz ] [i386.deb ppc64le.rpm freebsd_i386.tar.gz ] [mips.deb riscv64.rpm linux_amd64.tar.gz ] [mipsel.deb s390x.rpm linux_arm64.tar.gz ] [ppc64el.deb x86_64.rpm linux_armel.tar.gz ] [riscv64.deb linux_armhf.tar.gz ] [s390x.deb linux_i386.tar.gz ] [ linux_mips.tar.gz ] [ linux_mipsel.tar.gz ] [ linux_ppc64le.tar.gz ] [ linux_riscv64.tar.gz ] [ linux_s390x.tar.gz ]]

telegraf-tiger[bot] avatar Jul 17 '25 10:07 telegraf-tiger[bot]

Thanks @srebhan and @Hipska for your valuable suggestions!

neelayu avatar Jul 17 '25 17:07 neelayu