alternatives to redis?
Hi people, I'm currently working on a quite similar problem (trying to tame a gazilion metrics) and get some kind of anomaly detection into the mess.
So I wanted to ask why redis is backend? It seems a strange choice to me, being in memory it makes it limited by memory rather then disk (and you usually have a hell lot more disk). It also seems to require storing the timestamp with each metric effectively doubling (or more then given msgpack overhead) the storage consumption. And last but not least I can't see how the append operation is considered O(1) when it needs to relocat the whole data every time the size doubles it sounds like O(sqrt(n)) given the size of the data is always the same.
What I could not find is how long historical data is preserved given I know that I'm producing about 8g a day I can see to run out of memory in about 16 days not taking overhead into account probably 8 or less with that.
So I was wondering if you would be up for a discussion of an alternative backend(s), I currently ended up using Cassandra behind KairosDB (easier to write to and nice for aggregation) which so far works quite well and has a very sound storage mechanism with Cassandars Column based storage.
Cheers, Heinz
being in memory it makes it limited by memory rather then disk Yes, but memory is way faster than disk. We want real time, right?
It also seems to require storing the timestamp
Storing the timestamp was a decision that we made because we had different levels of resolution in our data. Some algorithms might need to make use of the actual time in order to work. None have so far, though :) I'd be open to adding a setting to not store any timestamps - it'd be a big memory boost.
I can't see how the append operation is considered O(1) http://redis.io/commands/append
how long historical data is preserved settings.FULL_DURATION
a different backend I'll need some more convincing. At scale, if you want very quick detection, you really need to use an in memory datastore. That becomes less true as you have a smaller amount of metrics, though. However, if we can think of a modular and easy way to support different backends, I'd be open to supporting that in the project.
On Sun, Sep 22, 2013 at 5:10 AM, Heinz N. Gies [email protected]:
Hi people, I'm currently working on a quite similar problem (trying to tame a gazilion metrics) and get some kind of anomaly detection into the mess.
So I wanted to ask why redis is backend? It seems a strange choice to me, being in memory it makes it limited by memory rather then disk (and you usually have a hell lot more disk). It also seems to require storing the timestamp with each metric effectively doubling (or more then given msgpack overhead) the storage consumption. And last but not least I can't see how the append operation is considered O(1) when it needs to relocat the whole data every time the size doubles it sounds like O(sqrt(n)) given the size of the data is always the same.
What I could not find is how long historical data is preserved given I know that I'm producing about 8g a day I can see to run out of memory in about 16 days not taking overhead into account probably 8 or less with that.
So I was wondering if you would be up for a discussion of an alternative backend(s), I currently ended up using Cassandra behind KairosDB (easier to write to and nice for aggregation) which so far works quite well and has a very sound storage mechanism with Cassandars Column based storage.
Cheers, Heinz
— Reply to this email directly or view it on GitHubhttps://github.com/etsy/skyline/issues/51 .
Abe Stanway abe.is
Redis makes it difficult to run conditional queries.
What about TempoDB as an extra backend?
how about ssdb