statsd-elasticsearch-backend icon indicating copy to clipboard operation
statsd-elasticsearch-backend copied to clipboard

Add new config so that you can specify which data to index

Open bobbyrenwick opened this issue 8 years ago • 3 comments

We've decided that we don't need the raw timer data when we have the timerData data too. This keeps the default as indexing everything but allows users of the backend to specify the types that they want to index.

The only thing that users need to be careful of, is if they choose to override the setting then they need to use any custom names they've given each data type.

bobbyrenwick avatar Apr 14 '16 13:04 bobbyrenwick

Can you describe the scenario in which you're sending data to statsd that you don't want flushed to ES? I'm a bit confused by this pull request.

markkimsal avatar May 25 '16 16:05 markkimsal

With these changes we save a huge amount of disk space. Without them we were indexing around 60GB a day but with the changes we now index around 4GB a day.

When you're using elasticsearch as a backend for statsd you lose the ability that Graphite and Whisper give you to set lower resolution time intervals for older data. By having a flush interval of 10 seconds on statsd, and not indexing the raw time data, we effectively have a lower resolution time interval and we like the trade off between disk space, resolution and also query performance.

bobbyrenwick avatar May 25 '16 17:05 bobbyrenwick

+1 We have the same problem. The aggregated data has everything we need already. There is no need to write all single documents into ES. Especially percentile-queries in Grafana can get very slow if they are based on the single docs instead of the ones with aggregated stats.

kaibra avatar May 10 '17 13:05 kaibra