go-carbon
go-carbon copied to clipboard
Support Cache Only Query ?
In traditional Graphite Implementation, each single query has to hit disk. In reading heavily situations (short time query, like query for last hour), it'll increase tons of IOs.
Any chance we can buffer short period of time datapoints so we can just get metrics from cache without hitting disk for short period of time query, like query for last hour ? We might need build something like TrieIndex to support wild card queries.
thoughts ?
I like the trigram-index
idea, how long does it finish for one round of walking all whisper files ?
Curious about about how it works in millions of metrics case.
Also, new metric will not be query-able until it has been flushed to disk/whisper.
For installation with millions of metrics I recommend to use clickhouse-based setup. It minimizes IO usage and don't have in-memory cache - all metrics and points are available in 1-2s after received.
Any benchmark around reading performance for clickhouse as backend storage ? curious if it's scalable in reading heavily case.
I already don't have whisper-based installations with heavy load and I have nothing to compare. You can compare yourself. Optimal in my opinion the configuration of the tables can be viewed here https://github.com/lomik/graphite-clickhouse-tldr
Thanks!
hmm... i tried that locally, looks like it even takes several seconds for single metric query...
a little weird, it takes 10s to finish a query in most cases, but in rare cases, it responds quickly.
looks like the very first request takes long time, and subsequent requests look fine. But if wait for some time, and try send query again. Then, it takes long time again, like warm up. Does it maintain any connection to clickhouse ?
ok, i got it, is there any reason we keep it 3 <keep_alive_timeout>3</keep_alive_timeout>
?
first request doesn't use disk cache in contrast to subsequent. do you use ssd or hdd?
It still takes around 10s if we make the second request after 3seconds. So looks like connecting to clickhouse takes time, after increasing keep-alive-time, subsequent query looks pretty fine. Not quite sure how it works internally. Will take a look. Thanks!
I'm using ssd, even it hit disk, it should not take 10s though
I'm using ssd, even it hit disk, it should not take 10s though
of course it shouldn't.
You can enable detailed logs in https://github.com/lomik/graphite-clickhouse-tldr/blob/master/graphite-clickhouse.conf#L6:
[clickhouse]
url = "http://clickhouse:8123/?max_query_size=2097152&readonly=2&log_queries=1"
[logging]
level = "debug"
After this graphite-clickhouse will log all access requests and clickhouse queries. And you can select detailed info about queries in clickhouse. Run client.sh and query
select * from system.query_log where type = 2 \G
Thanks for the info, my benchmark shows it could serve around ~40rq/s, sort of lower than what i expect, and relative high lat than regular graphite. I can see great write performance on graphite with clickhouse as backend. but looks like clickhouse is not designed for processing a large quantity of queries. I'll probably go with go-carbon + in-house optimization.
My current customized Graphite Cluster with 2 shards and 2 replicas (pure python + cython + some optimizations for short queries) serve traffic around 25k queries/min with p50 < 40ms and p95 < 230ms.
Thanks a lot for your prompt relies, very appreciate.
Hello @yunstanford, Could you please share your optimizations? Please PM me to [email protected] Thanks!