skyline icon indicating copy to clipboard operation
skyline copied to clipboard

alternatives to redis?

Open Licenser opened this issue 12 years ago • 4 comments

Hi people, I'm currently working on a quite similar problem (trying to tame a gazilion metrics) and get some kind of anomaly detection into the mess.

So I wanted to ask why redis is backend? It seems a strange choice to me, being in memory it makes it limited by memory rather then disk (and you usually have a hell lot more disk). It also seems to require storing the timestamp with each metric effectively doubling (or more then given msgpack overhead) the storage consumption. And last but not least I can't see how the append operation is considered O(1) when it needs to relocat the whole data every time the size doubles it sounds like O(sqrt(n)) given the size of the data is always the same.

What I could not find is how long historical data is preserved given I know that I'm producing about 8g a day I can see to run out of memory in about 16 days not taking overhead into account probably 8 or less with that.

So I was wondering if you would be up for a discussion of an alternative backend(s), I currently ended up using Cassandra behind KairosDB (easier to write to and nice for aggregation) which so far works quite well and has a very sound storage mechanism with Cassandars Column based storage.

Cheers, Heinz

Licenser avatar Sep 22 '13 03:09 Licenser

being in memory it makes it limited by memory rather then disk Yes, but memory is way faster than disk. We want real time, right?

It also seems to require storing the timestamp

Storing the timestamp was a decision that we made because we had different levels of resolution in our data. Some algorithms might need to make use of the actual time in order to work. None have so far, though :) I'd be open to adding a setting to not store any timestamps - it'd be a big memory boost.

I can't see how the append operation is considered O(1) http://redis.io/commands/append

how long historical data is preserved settings.FULL_DURATION

a different backend I'll need some more convincing. At scale, if you want very quick detection, you really need to use an in memory datastore. That becomes less true as you have a smaller amount of metrics, though. However, if we can think of a modular and easy way to support different backends, I'd be open to supporting that in the project.

On Sun, Sep 22, 2013 at 5:10 AM, Heinz N. Gies [email protected]:

Hi people, I'm currently working on a quite similar problem (trying to tame a gazilion metrics) and get some kind of anomaly detection into the mess.

So I wanted to ask why redis is backend? It seems a strange choice to me, being in memory it makes it limited by memory rather then disk (and you usually have a hell lot more disk). It also seems to require storing the timestamp with each metric effectively doubling (or more then given msgpack overhead) the storage consumption. And last but not least I can't see how the append operation is considered O(1) when it needs to relocat the whole data every time the size doubles it sounds like O(sqrt(n)) given the size of the data is always the same.

What I could not find is how long historical data is preserved given I know that I'm producing about 8g a day I can see to run out of memory in about 16 days not taking overhead into account probably 8 or less with that.

So I was wondering if you would be up for a discussion of an alternative backend(s), I currently ended up using Cassandra behind KairosDB (easier to write to and nice for aggregation) which so far works quite well and has a very sound storage mechanism with Cassandars Column based storage.

Cheers, Heinz

— Reply to this email directly or view it on GitHubhttps://github.com/etsy/skyline/issues/51 .

Abe Stanway abe.is

astanway avatar Sep 22 '13 12:09 astanway

Redis makes it difficult to run conditional queries.

hit9 avatar Mar 24 '14 06:03 hit9

What about TempoDB as an extra backend?

thedrow avatar Apr 20 '14 11:04 thedrow

how about ssdb

hit9 avatar Jul 24 '14 03:07 hit9