statsd-elasticsearch-backend
statsd-elasticsearch-backend copied to clipboard
Add new config so that you can specify which data to index
We've decided that we don't need the raw timer
data when we have the timerData
data too. This keeps the default as indexing everything but allows users
of the backend to specify the types that they want to index.
The only thing that users need to be careful of, is if they choose to override the setting then they need to use any custom names they've given each data type.
Can you describe the scenario in which you're sending data to statsd that you don't want flushed to ES? I'm a bit confused by this pull request.
With these changes we save a huge amount of disk space. Without them we were indexing around 60GB a day but with the changes we now index around 4GB a day.
When you're using elasticsearch as a backend for statsd you lose the ability that Graphite and Whisper give you to set lower resolution time intervals for older data. By having a flush interval of 10 seconds on statsd, and not indexing the raw time data, we effectively have a lower resolution time interval and we like the trade off between disk space, resolution and also query performance.
+1 We have the same problem. The aggregated data has everything we need already. There is no need to write all single documents into ES. Especially percentile-queries in Grafana can get very slow if they are based on the single docs instead of the ones with aggregated stats.