telegraf icon indicating copy to clipboard operation
telegraf copied to clipboard

Enable / Disable plugin instances

Open neelayu opened this issue 8 months ago • 10 comments

Use Case

In one of our deployments, we plan to deploy multiple instances of telegraf and all of them fetching the config from an HTTP endpoint. Each running instance is only supposed to "activate" certain set of plugins for which it is configured. Although there is a way to achieve it by some tedious configurations, I think a more elegant solution would be to have a method similar to what k8s does using its Labels and Selector logic.

There have been requests for this feature dating back to 2016! https://github.com/influxdata/telegraf/issues/9304 https://github.com/influxdata/telegraf/issues/1317 https://github.com/influxdata/telegraf/issues/10543

Expected behavior

Telegraf runs with only "enabled" plugins

Actual behavior

No such provision directly.

Additional info

Few of the solutions offered previously- Comment the plugin to "disable" it. This works, but for a centralized config this approach is not feasible. Rename the file to ignore. Since Telegraf only reads .conf files, we can effectively disable set of plugins if we rename the file to something like .conf.ignore. This approach is complex to manage. Good for debugging.

The proposed solutions using selectors and labels makes it extensible. To ensure backward compatibility, we need to define certain rules.

Compatibility Matrix: Telegraf Plugin Selector System

Telegraf Run State Selector Present Behavior
Running with --label Selector Present Plugin is selected if the selector matches the label. Else if selector doesn't match, plugin is not selected.
Selector Not Present Plugin is selected (backward compatibility).
Running without --label Selector Present Plugin is selected (no label to compare against).
Selector Not Present Plugin is selected (current behavior).

In essence, if we want to disable the plugin, all we need to do is run telegraf with a label and without a matching selector. It will act as a negative selector.

For users who wish to take full advantage of this feature, will eventually provide matching selectors for their running instances of telegraf.

As an example, we have the following

telegraf --config-directory selectors/ --label="app:frontend" --label="drop:news-1" --watch-config=notify --print-plugin-config-source=true

Lets say we have

selectors/
  one.conf
  two.conf

one.conf

[[outputs.http]]
    selector = {app = "frontend"}

[[inputs.mem]]
    selector = {app = "frontend2"}

[[inputs.cpu]]
    selector = {app = "backend"}

two.conf

[[inputs.disk]]
    selector = {app = "frontend", drop = "news-2"}

[[inputs.netstat]]
    [inputs.netstat.selector]
        app = "frontend"
        drop = "news-1"

[[inputs.powerdns]]
    selector = {app = "frontend"}

The following would be selected

outputs.http
inputs.netstat
inputs.powerdns

The following would be dropped

inputs.mem
inputs.cpu
inputs.disk

Suggestions are welcome to improve this. I have created a PR for the same #16705

neelayu avatar Mar 29 '25 13:03 neelayu

I would propose to use array syntax for selectors, that makes it similar to the existing metric filtering config.

Ex:


[[inputs.netstat]]
    [inputs.netstat.selector]
        app = ["frontend*", "backend"]
        drop = ["news-1"]

So it would be selected for any frontend or the backend app.

Hipska avatar Mar 31 '25 09:03 Hipska

@neelayu hmmm why wouldn't a slice be enough something like

[[outputs.http]]
    selectors = ["frontend", "backend"]

[[inputs.mem]]
    selectors =["backend"]

[[inputs.cpu]]
    selectors =["backend"]

[[inputs.modbus]]
    selectors =["factory"]

be enough? This way the checking is pretty easy I think.

My biggest concern is the complexity this might grow into. What if you specify

# telegraf --label frontend --label factory

Does it mean "frontend" and "factory" or does it mean "frontend" or "factory"? In any case I would like to ask you to write a spec for this so the behavior is clear for everyone... See https://github.com/influxdata/telegraf/tree/master/docs/specs for other specs we already have.

srebhan avatar Mar 31 '25 18:03 srebhan

I see the confusion-

The --label flag is essentially what k8s has in metadata.labels. It is a way of attaching objects. In a way, it's just a list.

The selectors on the other hand are similar to k8s matchLabels. ie all selectors should match labels (Selectors are supposed to be subset of labels)

I have kept it similar to k8s since it provides a way to expand our capabilities in the future to incorporate selections based on matchExpressions and operator.

Let me know your thoughts. As @Hipska also mentions, wildcardmatching is also an option. But this gives us a strict matching.

neelayu avatar Apr 01 '25 08:04 neelayu

@neelayu I'm not familiar with k8s selection mechanism but I don't want to make things more complicated in Telegraf as they need to be. Using maps has the drawback that, if you use tables for declarations, the table must be at the end of the plugin declaration. Furthermore, we can still have label pairs by doing

[[inputs.foo]]
  selectors = ["apps.backend", "source.backend", ...]

or

[[inputs.foo]]
  selectors = ["apps=backend", "source=backend", ...]

If you then use a filter expression and specify the dot (.) or equal-sign (=) as separator you can easily do expressions like *.backend or apps.*.

srebhan avatar Apr 14 '25 08:04 srebhan

Yes. I am aware of the TOML limitations for tables or maps. Your proposal for syntax selectors=["apps=backend"] seems elegant and captures the necessary information. Just to double check, can you also confirm if we want to support wildcards like *? I am not inclined towards it as it will make things complicated.

If that is the case, I can draft up a spec for the same. Thanks!

neelayu avatar Apr 14 '25 09:04 neelayu

Well the idea for the wildcards comes from the fact that we see this often. So what you can do is to use a filter which should not make things complicated with wildcards. However, if you do have doubts, we can start off with exact matches but belief me the code will not be much simpler. ;-)

Please also think about and and or operations e.g. being able to specify --select multiple times means an or and a comma-separated list of statements means an and e.g. --select 'apps=backend, source=factory*' --select 'source=frontend' would mean select everything that has (apps == backend AND source 'starts with' factory) OR source == frontend. We don't need this in the first implementation but I know this will be a feature request sooner or later...

A spec (see https://github.com/influxdata/telegraf/tree/master/docs/specs) would be good though to have things documented.

srebhan avatar Apr 17 '25 09:04 srebhan

Hi @srebhan and @Hipska I managed to draft up the spec. https://github.com/influxdata/telegraf/pull/16884

I believe we can go with wildcard matching using filter. The document specifies all the necessary information. Thanks!

neelayu avatar Apr 26 '25 13:04 neelayu

Hi @srebhan and @Hipska, sorry for pinging you, but did you get a chance to look at the spec? thanks!

neelayu avatar May 21 '25 13:05 neelayu

I did and approved 😉

Hipska avatar May 26 '25 10:05 Hipska

Thanks @Hipska

neelayu avatar May 26 '25 10:05 neelayu