cyanite
cyanite copied to clipboard
Table names should be configurable by rollup
This is an enhancement request, and something that I may work on myself. To maximize Cassandra compaction (and just for general non-hardcoding reasons), we should able to specify a Cassandra table per rollup. This way, I can set different compaction methods, as well as chart the growth of the various rollups. This method may also be useful later on when changing over from the current set-based rollups to a single value rollup as we had in the carbon format.
Thoughts?
Hi @mwmanley,
This seems sensible indeed, and should be an easy change.
I wrote some code that does this and seems to work based off the master branch. You're free to have it and laugh at my sad attempts at clojure. It takes table names as part of the configuration YAML and writes to those in lieu of the monolithic "metric" table.
It could be useful later in making statements that do rollups, which I may work on since this project is kinda pressing for me. My company is heavily using graphite to receive hundreds of thousands of metrics every 10 seconds, and the carbon backends are literally melting from the load writing to SSD.
BTW, this change quartered the load on all my Cassandra servers immediately so that they aren't trying to keep up with compaction given the high volume. So, yay!
Just a quick question - wouldn't it be smart to use Date Tiered instead of Size Tiered compaction for these tables?
@vanjos definitely would be :-) it's mostly a compatibility decision for now
@pyr so then... on > 2.1 - go for it?
also is @mwmanley PR a worthwhile enhancement? (I notice pkg/deb/cyanite looks incorrect though) https://github.com/pyr/cyanite/compare/master...mwmanley:tapad
seems like what may be missing is some more Readme to it...
I would not say mine is currently in a state to be used unless I gave you a certain checkout since I was silly about branching. While it does segregate out the data into separate tables for compaction, which is improving things quite a bit, I'm not a clojure expert and it needs a lot of cleanup. I was also trying to tackle the rollup issues too because I need to have that working, but it's in a bit of a snit state with performance.
My code works now, but 1) is not pretty and 2) intermittently throws a NPE during rollups. I'm not sure yet why that is. But, I do have rollups implemented, though it does need a major overhaul in coding as they are more or less inline. I'm sure anyone that knows clojure well (I freely admit that I don't very well; I was a C/PERL guy) will probably go "oh, why did you do that???," but the code does work.
@pyr, feel free to look at my pull and gasp in horror as I turned your code into a monster with a ton of extra spurious commits since I was developing on a machine whose existence was somewhat threatened, so I had to push changes so that I wouldn't lose a ton of work. I had ideas, some were probably actually decent. There are some limitations though in what I did given that the rollups base their data on the lowest rollup table, which has to be evenly divisible by other rollup times.
What I should have done, and may revisit, is turning the rollups into a separate store engine instead of just glomming onto the data streaming into the current store, which is being rearranged eight ways to Sunday to suit my needs. Again, I will claim no elegance here--I am new to clojure and its ways. Now that I know a bit more, I will probably revisit soon, but it's working well enough for my needs as it is now, which are an influx of 300000+ metrics every 10 seconds from our servers that is turning my SSD-backed graphite whisper store into essentially pudding before my very eyes, hence the urgency.
@mwmanley thanks for this, I'll dive into it soon.
I fixed the NPE. I was eval'ing the atom that keeps the rollup state incorrectly on the larger rollups, so it seems to be working now. There is still a lot of streamlining that needs to happen in the code, as well as fixing my clumsy clojure.
@mwmanley Is your branch in a state to try in production? What configuration changes would I need to make compared to the normal YAML config to get the best use out of it?
Also, am I reading the code incorrectly (I'm terrible at clojure), or does this address the rollup size issue by using average/min/max rather than just putting all the previous stat points into a rolled up row?
@jeffpierce @mwmanley I'll try to adapt the change for #80. I think I will first release a new cyanite without this feature and make it a priority for the follow-up release. In the mean-time, the aggregation feature that the new cyanite brings will get you to a good place in terms of compaction pressure I would think.
@mwmanley Would you say that this is still needed now that #80 has been merged ? When leveraging DateTieredCompactionStrategy
and with the much less frequent hits to cassandra, should we still pursue configurable Column Family names ?
I would think so. I think it's still the case where it's suboptimal to mix rows with different TTLs.
Also, I have my long-term rollups being compressed by deflate versus lz4 since those data are infrequently used. Different tables do allow for different compression as well as compaction. :-)
this enhancement seems even more valuable now that #80 has been implemented. while performance tuning our system this is starting to look like the most valuable change besides increasing hardware.
Hi
can you provide sample configuration ?
Also what code have you changed ?
Thanks Sunil
i have implemented a fix for this by simply appending the ttl value to the table name (e.g. metrics are stored in "metric_ttl86400" rather than "metric"). i simply branched from the latest revision of cyanite 0.1.3 and changed a few lines in store.clj. you can see the revision here: https://github.com/tjamesturner/cyanite/commit/31bcbea05b300dbbe3475760914e23144ded7440
if you wish to use the feature, simply add "use_ttl_table_suffix: true" to the "store" config inside cyanite.yaml and create a table for the ttl of each rollup under your "carbon" config. (note: ttl is measured in seconds)