telegraf icon indicating copy to clipboard operation
telegraf copied to clipboard

Telegraf Configuration - Recommended approach for configuring multiple independent cylos of input-output plugins

Open keshav613 opened this issue 1 year ago • 1 comments

Requirement: I have to read data from kafka and send it to datadog. The catch here is I have one kafka topic and one datadog endpoint for each customer. And there are 5000 customers, so in total we have 5000 of kafka topics and 5000 datadog plugins.

When the scale was low, I was creating one telegraf pod per customer to read from kafka topic and to send to datadog. But as the scale went to 5000, the Ops is worried about the resources constraints and monitoring of all those 5000 telegraf pods. Kafka topic will receive 5000 * 1KB of data every 10sec, the scale of data can also increase in future.

Is there any optimized way to handle this? Upon researching a bit I came accross two approaches

  1. To have input plugins of 5000 kafka topics and output plugins in the same telegraf.conf file. By default telegraf sends data from all the input plugins to all the output plugins, but with tagpass(unique tag for each customer) we can restrict metric from one topic to be routed for its corresponding datadog output plugin. But I doubt if telegraf node can handle this at the scale of 5000 customers, because the time complexity will become O(N^2) and not sure how much resources(cpu, mem) should be given for that single telegraf pod.

  2. To have individual telegraf services running in the same pod ... as discussed in https://github.com/influxdata/telegraf/issues/6334#issuecomment-1386698970. But won't be possible for the scale of 5000 customers.

I understand that telegraf might not be built to handle such usecase and I should probably use a microservice which should do it, but would love to know if it's possible to achieve this with telegraf?

keshav613 avatar Dec 18 '24 04:12 keshav613

I would go for option 1 and split into multiple instances (maybe per 1000?) if needed for the resources (mem,cpu) used.

This is not really an issue, but more a support question. This should better be placed at the Discourse or Slack channels..

Hipska avatar Jan 15 '25 11:01 Hipska

@keshav613 as @Hipska said, having one input plugin for the 5k Kafka topics and then have one output plugin per customer with tagpass is what I suggest. You may want to have one config file for the Kafka input and then 5k files for the individual output plugins as this allows you to use a template-based generator to produce those files and keep track of what is there...

Does this answer your question and can we close this issue?

srebhan avatar Oct 22 '25 10:10 srebhan

Hello! I recommend posting this question in our Community Slack or Community Forums, we have a lot of talented community members there who could help answer your question more quickly. You can also learn more about Telegraf by enrolling at InfluxDB University for free!

Heads up, this issue will be automatically closed after 7 days of inactivity. Thank you!

telegraf-tiger[bot] avatar Oct 22 '25 10:10 telegraf-tiger[bot]

Hello! I am closing this issue due to inactivity. I hope you were able to resolve your problem, if not please try posting this question in our Community Slack or Community Forums or provide additional details in this issue and reqeust that it be re-opened. Thank you!

telegraf-tiger[bot] avatar Oct 29 '25 18:10 telegraf-tiger[bot]