carbonapi
carbonapi copied to clipboard
[Performance] How to minimize/restrict memory utilization?
Problem description
We just moved our large system from graphite-web to carbonapi -> go-carbon which results in much faster performance and also allows us to scale horizontally easily (with multiple backend servers defined in carbonapi).
From my team's side we are managing the monitoring system, but not individual dashboards and how they are querying, that is done by others. Which results in very large queries to carbonapi (both number of metrics requested at once and the date range).
The above causes carbonapi to run out of memory every few minutes on a 128 GB server. This did not happen with graphite-web ever (it was slow, but memory footprint there was pretty small). I've tried to tune following:
- Number of concurrent connections -- no effect on memory usage really
- Setting caches to XX MBs -- only seems to result in much faster rss growth (OOM within seconds)
- Completely disabling cache -- only seems to result in much faster rss growth (OOM within seconds)
- Timeouts -- have no effect on memory usage
- Changing between v1 and v2 backends -- v2 backends consume memory VERY quick (like in 2-3 seconds since startup it is already OOM), v1 memory growth is much slower
Seems it is related to having multiple backend servers and needing to merge response which is done in-memory. So there is no setting really that controls that. At least no setting for that in documentation.
So my question is -- how to restrict memory usage in carbonapi to avoid OOM?
carbonapi's version
v0.15.4
Does this happened before N/A, it did not happen before, but before it was graphite-web with single data server, not carbonapi with multiple backends.
carbonapi's config
listen: "0.0.0.0:8081"
prefix: ""
useCachingDNSResolver: false
cachingDNSRefreshTime: "1m"
expvar:
enabled: true
pprofEnabled: false
listen: ""
cpus: 10
concurency: 1000
maxBatchSize: 100
idleConnections: 10
pidFile: ""
cache:
type: "mem"
size_mb: 0
defaultTimeoutSec: 60
graphite:
host: ""
interval: "60s"
prefix: "carbon.api"
pattern: "{prefix}.{fqdn}"
upstreams:
tldCacheDisabled: true
buckets: 10
slowLogThreshold: "10s"
timeouts:
find: "60s"
render: "60s"
connect: "200ms"
concurrencyLimitPerServer: 0
keepAliveInterval: "5s"
maxIdleConnsPerHost: 100
doMultipleRequestsIfSplit: false
backends:
- "http://backend1:8080"
- "http://backend2:8080"
- "http://backend3:8080"
# carbonsearch is not used if empty
carbonsearch:
logger:
- logger: ""
file: "stderr"
level: "error"
encoding: "console"
encodingTime: "iso8601"
encodingDuration: "seconds"
backend software and config
go-carbon(s):
[common]
user = "user"
graph-prefix = "carbon.agents.{host}"
metric-endpoint = "local"
metric-interval = "10s"
max-cpu = 20
[whisper]
data-dir = "/data/graphite/whisper/"
schemas-file = "/etc/go-carbon/storage-schemas.conf"
aggregation-file = "/etc/go-carbon/storage-aggregation.conf"
workers = 20
max-updates-per-second = 0
max-creates-per-second = 0
hard-max-creates-per-second = false
sparse-create = false
flock = false
enabled = true
hash-filenames = true
[cache]
max-size = 6000000
write-strategy = "sorted"
input-buffer=600000
[udp]
listen = ":2003"
enabled = true
log-incomplete = false
buffer-size = 0
[tcp]
listen = ":2003"
enabled = true
buffer-size = 0
[pickle]
listen = ":2004"
max-message-size = 67108864
enabled = false
buffer-size = 0
[carbonlink]
listen = "127.0.0.1:7002"
enabled = true
read-timeout = "30s"
[grpc]
listen = "127.0.0.1:7003"
enabled = true
[tags]
enabled = false
tagdb-url = "http://127.0.0.1:8000"
tagdb-chunk-size = 32
tagdb-update-interval = 100
local-dir = "/var/lib/graphite/tagging/"
tagdb-timeout = "1s"
[carbonserver]
listen = "0.0.0.0:8080"
enabled = true
buckets = 10
metrics-as-counters = false
read-timeout = "60s"
write-timeout = "60s"
query-cache-enabled = true
query-cache-size-mb = 0
find-cache-enabled = true
trigram-index = true
scan-frequency = "5m0s"
max-globs = 10000
fail-on-max-globs = false
graphite-web-10-strict-mode = true
internal-stats-dir = ""
stats-percentiles = [99, 98, 95, 75, 50]
[dump]
enabled = false
path = "/var/lib/graphite/dump/"
restore-per-second = 0
[pprof]
listen = "localhost:7007"
enabled = false
[[logging]]
logger = ""
file = "/var/log/go-carbon/go-carbon.log"
level = "warn"
encoding = "mixed"
encoding-time = "iso8601"
encoding-duration = "seconds"
Query that causes problems
- What's the query? can't provide exact query, but example would be using
aliasByNode
with multiple wildcards. - How many metrics does it fetch? 1000s
- What's the resolution of those metrics? We don't have an insight into that, as that is defined by individual dev teams, so it is possible resolutions are varying within single request. However from what I can see 90% of all metrics in system are using 5s resolution (that is required for devs to troubleshoot issues, cannot really lower the resolution to higher intervals). Then after 7 days those 5 seconds are aggregated to 1m (by go-carbon), then after 30 days to 10min, at which point they are kept to a total of 2 years.
- How many datapoints per metric do you have? Not sure I understand the question, see aggregation/retention policy above.
Hi,
Some comments:
Number of concurrent connections -- no effect on memory usage really
It should depend on how many actual connections your users generate.
Setting caches to XX MBs -- only seems to result in much faster rss growth (OOM within seconds)
That is odd. Basically if inserting a value causes overflow in terms of size, cache evicts some random items (https://github.com/dgryski/go-expirecache/blob/master/cache.go#L93-L95). That would cause more load on GO's GC but appart from that it should slow down the rss growth.
But let's keep that thought about GC in mind for now
Completely disabling cache -- only seems to result in much faster rss growth (OOM within seconds)
That actually can happen as the more actual requests you do, the more garbage you'll have for golang to collect
Changing between v1 and v2 backends -- v2 backends consume memory VERY quick (like in 2-3 seconds since startup it is already OOM), v1 memory growth is much slower
That is extremely weird, as backendv1 is converted internally to a backendv2 (there is just an extra step to pre-populate some things there). Basically for each backend section you have backendv2 section that behaves the same (https://github.com/go-graphite/carbonapi/blob/main/zipper/config/config.go#L128-L155)
Basically your section:
backends:
- "http://backend1:8080"
- "http://backend2:8080"
- "http://backend3:8080"
Is equivalent to:
doMultipleRequestsIfSplit: true
backendsv2:
backends:
-
groupName: "backends"
protocol: "carbonapi_v2_pb"
lbMethod: "broadcast"
maxBatchSize: 100
keepAliveInterval: "10s"
maxIdleConnsPerHost: 1000
doMultipleRequestsIfSplit: true
servers:
- "http://backend1:8080"
- "http://backend2:8080"
- "http://backend3:8080"
Also it would override global doMultipleRequestsIfSplit
and set it to true
to mimic old behavior. In that case the difference would be that your requests will be split by maxBatchSize amount of metrics and several of them will be sent in parallel as separate requests to all your backends.
That would be more friendly towards Golang's GC.
Seems it is related to having multiple backend servers and needing to merge response which is done in-memory. So there is no setting really that controls that. At least no setting for that in documentation.
maxBatchSize controls that to some extent. Also carbonapi_v3_pb
on one hand saves a bit on extra information but as it supports sending multiple requests in single HTTP request it might be more taxing on a GC.
As well concurrencyLimitPerServer
limits amount of parallel connections, in your case I would rather change it to some smaller value (0 = unlimited) as that would allow server to merge more smaller requests together.
So my question is -- how to restrict memory usage in carbonapi to avoid OOM?
With go-carbon ways how to deal with that are rather limited, unfortunately (it doesn't support sending already pre-aggregated replies and would always do it's best to send you all the data it can). I'm not sure if there are any ways to limit size of reply on go-carbon side to be honest, as that would be the better approach here. Otherwise - you can limit amount of concurrent queries and actually enable caches (the less you need to go to backends the better it would be for you).
There were some efforts by @msaf1980 to improve how caching is done in general. You might want to look at his work (it's currently in master, I haven't cut a release yet as I want to fix few issues first).
It might also help if you'll be able to collect some heap profiles and share svg: https://go.dev/doc/diagnostics#profiling
carbonapi provides a way to enable expvar and pprof on a separate port: https://github.com/go-graphite/carbonapi/blob/main/cmd/carbonapi/carbonapi.example.yaml#L27-L30
You can enable it there and collect profiles with curl http://carbonapi:[expvar_listen_port]/debug/pprof/heap > heap.pprof
and then use the docs I've linked above. Or you can use this article as a reference: https://medium.com/compass-true-north/memory-profiling-a-go-service-cd62b90619f9 (it seems to have all the steps listed there).
Oh, and I forgot to mention, as there is some evidence that it might be actually GC pressure, it would be great to see what Go version you are using and maybe you can play a bit with GOGC value (https://pkg.go.dev/runtime). There are numerous articles on how to do that or what it means:
- https://dave.cheney.net/tag/gogc
- https://archive.fosdem.org/2019/schedule/event/gogc/
And some others. So maybe lowering it might help if garbage collection issues is actually the case here.
it would be great to see what Go
Currently I am using image from Docker Hub, but I've built the tip using 1.17 and in both cases the behavior is similar. Though I did not measure the time it takes to OOM in both cases.
I have changed the setup from running 1 Grafana -> 1 carbonapi -> 2+ go-carbon
to 1 Grafana -> 1 Nginx LB -> 2+ carbonapi+go-carbon
that naturally spreads requests in RR fashion across multiple carbonapi instances, and while it won't share caches etc, that has improved the memory situation by a lot (not a single OOM just yet). It still would go up to some high number, but much slower, and it would release memory eventually.
When you are having a backends in broadcast
mode they will replies from all your backends and try to merge them, so if you have 2 copies of data you'll need about 3x memory amount (for a short period of time) to store that data.
About releasing memory - golang as most of GC-based languages do not really like to do it, so even if it's not in-use it won't be released for quiet some time. That is expected. That's why you have some metrics exported by carbonapi itself. For actual memory usage you should refer to them.
Overall for heavy requests I would recommend considering migrating backend to graphite-clickhouse/carbon-clickhouse and use them and enable backend side aggregation. Not only you won't need to use broadcast mode (if you have replication enabled on CH side, it will ensure that data is the same across all your replicas) but also it can pre-aggregate responses based on what Grafana requested. That usually will give about some reduction in amount of data you need to fetch and process. However that obviously have it's own drawbacks (you'll need to manage clickhouse installation, you'll need to migrate data somehow, clickhouse in general is slower for single reads and low amount of updates, but it's faster for bulk reads and writes)
so if you have 2 copies of data
No, we don't have copies of data, it is all separate across different servers (using carbon-relay-ng and consistent hashing), as the write throughput of single server (raid10 sata3 ssds) is not enough anymore (unless we go oh-so-expensive NVMe storage). But it does merge since with consistent hashing we can't predict which metrics will go where and single dashboard may be loading metrics from different storage hosts. But that is expected.
The unexpected part is how much memory it wants to use. The first setup (which had OOMs) has 128 GB RAM, which cabonapi consumed in a matter of seconds in some instances. The current setup is 2 servers each with 256 GB RAM and carbonapi periodically gets close to the max.
Comparing it to graphite-web -- same data requested uses barely 2MB of RAM. But it is entirely possible that the merging is done on disk there? (that could be the case because of how insanely slow it is to give data from 2 sources).
We did not have this issue for awhile, and recently started hitting it again. I did another round of config tuning lowering concurrency
, maxBatchSize
, upstreams.backendsv2.maxBathSize
and upstreams.carbonsearchv2.maxBathSize
. That has fixed it again for now.
Perhaps would be good to create a performance tuning documentation on how to tune towards different use-cases?
We just went into production with carbonapi 0.16.0~1 and also quickly ended up having memory issues - carbonapi had maybe 100 MiB of data in cache, but process memory consumption was 15 GiB and increasing. We have two go-carbon carbonserver backends in broadcast mode, but the memory issue can be replicated with a single backend as well.
After some testing, we think this is a memory leak related to carbonapi response cache and JSON response format. Here's a small bash test script to run locally on an idle carbonapi server - it keeps requesting the same data in a loop once a second, increasing maxDataPoints by one to force a cache miss every time, and records carbonapi's RSS (resident set size) memory usage and change between requests:
#!/bin/bash
carbonapi_pid=$(pgrep -u carbon carbonapi)
if [ -z "$carbonapi_pid" ]; then
printf "carbonapi is not running\n"
exit 0
fi
render_url="localhost:8080/render"
target="testing.carbonapi.*.runtime.mem_stats.*"
range="from=-48h&until=now"
format="json"
request="${render_url}/render?target=${target}&${range}&format=${format}"
printf "Teasing carbonapi at $render_url\n"
rss_before=$(ps -q $carbonapi_pid --no-headers -o rss)
for points in {1000..2000}; do
curl --silent --show-error "${request}&maxDataPoints=$points" > /dev/null || break
rss_after=$(ps -q $carbonapi_pid --no-headers -o rss)
printf "%s # carbonapi RSS: %9d bytes (delta %6d bytes)\n" \
"maxDataPoints=$points" $rss_after $(($rss_after - $rss_before))
rss_before=$rss_after
sleep 1
done
With carbonapi response cache enabled and backend cache disabled, i.e.:
cache:
type: "mem"
size_mb: 0
defaultTimeoutSec: 60
backendCache:
type: "null"
...and running the script on a small test VM, carbonapi runs out of memory pretty fast:
# JSON format, response cache on, backend cache off = OOM
$ ./carbonapi_oom.sh
Teasing carbonapi at localhost:8080/render
maxDataPoints=1000 # carbonapi RSS: 38140 bytes (delta 18396 bytes)
maxDataPoints=1001 # carbonapi RSS: 43920 bytes (delta 5780 bytes)
maxDataPoints=1002 # carbonapi RSS: 57980 bytes (delta 14060 bytes)
...
maxDataPoints=1095 # carbonapi RSS: 374736 bytes (delta 12164 bytes)
maxDataPoints=1096 # carbonapi RSS: 386352 bytes (delta 11616 bytes)
curl: (52) Empty reply from server # kernel OOM killer
The selected metrics query and size of the response affects memory consumption rate, but the point is here that we hardly ever see the RSS figure going down. Sometimes the delta stays at zero for a few requests, but overall it's almost linear increase.
However if we switch from using response cache to backend cache, or request the data in CSV format instead of JSON, carbonapi's memory consumption stays perfectly in control:
# JSON format, response cache off, backend cache on = OK
# CSV format, response cache on, backend cache off = OK
# CSV format, response cache off, backend cache on = OK
$ ./carbonapi_oom.sh
Teasing carbonapi at localhost:8080/render
maxDataPoints=1000 # carbonapi RSS: 37192 bytes (delta 18056 bytes)
maxDataPoints=1001 # carbonapi RSS: 46680 bytes (delta 9488 bytes)
maxDataPoints=1002 # carbonapi RSS: 53200 bytes (delta 6520 bytes)
...
maxDataPoints=1048 # carbonapi RSS: 92164 bytes (delta 5680 bytes)
maxDataPoints=1049 # carbonapi RSS: 78564 bytes (delta -13600 bytes)
maxDataPoints=1050 # carbonapi RSS: 78884 bytes (delta 320 bytes)
maxDataPoints=1051 # carbonapi RSS: 91740 bytes (delta 12856 bytes)
...
maxDataPoints=1161 # carbonapi RSS: 125932 bytes (delta 13300 bytes)
maxDataPoints=1162 # carbonapi RSS: 106184 bytes (delta -19748 bytes)
maxDataPoints=1163 # carbonapi RSS: 94868 bytes (delta -11316 bytes)
maxDataPoints=1164 # carbonapi RSS: 94700 bytes (delta -168 bytes)
...
maxDataPoints=2000 # carbonapi RSS: 96560 bytes (delta -24092 bytes)
So a workaround for carbonapi's huge memory usage seems to be disabling response cache and relying on backend cache instead.
We are not go experts here, but a colleague of mine tried profiling the issue, and all the excess memory seems to be used by MarshalJSON()
. Maybe there's something wrong in the way JSON data is being processed, or the way response cache eviction works.
Can you also grab and share memory profile?
If cache enabled - it's a problem. Different maxDataPoints produce different data set (because different cache key used). It work as expected, but potentially can be exploited in untrusted environment.
For example, i't a key build functions
https://github.com/go-graphite/carbonapi/blob/743814d9d6aba501b15c7eeed857116af0d34000/cmd/carbonapi/http/render_handler.go#L424
@easterhanu If you need protect against it - set cache size limit.
@Civil here's a heap profile moments before OOM:
@msaf1980 as far as I can tell, setting response cache size_mb
for example to 50 or 100 MiB has no effect to the test result - RSS still just keeps increasing until carbonapi runs out of memory. According to carbonapi's internal cache_size
metric the cache size itself is not the problem, memory is spent somewhere else (e.g. in production we've seen tens of MiBs in cache, but several GiBs of memory in use).
We are not go experts here, but a colleague of mine tried profiling the issue, and all the excess memory seems to be used by
MarshalJSON()
. Maybe there's something wrong in the way JSON data is being processed, or the way response cache eviction works
Direct write to http.ResponseWriter may be a solution. I do some test early, but don't make a PR - in our environment we don't have memory overload (16 GB is not too costly for huge installations).
We did some more testing and debugging, and think the root cause for response cache's huge memory consumption is this : https://github.com/go-graphite/carbonapi/blob/cdf42a3b13ace4335f318e6a9f8373480c42230b/expr/types/types.go#L123
https://github.com/go-graphite/carbonapi/commit/bf0ffdceddd069c2fa1761fb3697002eca72f85c changed the way the byte slice is created from var b []byte
to b := make([]byte, 0, n)
but we believe the math for calculating the value of n
is wrong. Adding some printf statements shows the cap
of the created slice being ~15x more than the len
of the actual data requires. When these oversized slice references are then stored to response cache, they end up taking a lot of memory which is not really used for anything by the application, but which go's garbage collector cannot clean up either.
There are other memory concerns too. During past weekend we had two production servers running carbonapi with just the backend cache enabled (as a workaround for response cache issues). Server A carbonapi had steady memory consumption around ~50 MiB, whereas server B carbonapi got killed by kernel OOM killer after reaching almost 30 GiB.
Carbonapi settings were the same for both servers, but B was serving some really heavy wildcard requests which would often fail with something like
WARN zipper errors occurred while getting results {"type": "protoV2Group", "name": "http://xxxxx", "type": "fetch", "request": "&MultiFetchRequest{Metrics:[]FetchRequest{FetchRequest{ ... (insert tons of FetchRequests for different metrics) ... "errors": "max tries exceeded", "errorsVerbose": "max tries exceeded\nHTTP Code: 500\n\ngithub.com/go-graphite/carbonapi/zipper/types.init\n\t/root/go/src/github.com/go-graphite/carbonapi/zipper/types/errors.go:25\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6321\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6298\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6298\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6298\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:233\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1594\n\nCaused By: failed to fetch data from server/group\nHTTP Code: 500\n\ngithub.com/go-graphite/carbonapi/zipper/types.init\n\t/root/go/src/github.com/go-graphite/carbonapi/zipper/types/errors.go:27\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6321\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6298\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6298\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6298\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:233\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1594\n\nCaused By: error while fetching Response\n\ngithub.com/go-graphite/carbonapi/zipper/types.init\n\t/root/go/src/github.com/go-graphite/carbonapi/zipper/types/errors.go:34\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6321\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6298\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6298\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6298\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:233\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1594"}
Sometimes we could also see "could not expand globs - context canceled" errors on the go-carbon side.
Switching doMultipleRequestsIfSplit: true
to doMultipleRequestsIfSplit: false
seemed to fix the query issues and also memory consumption, although we do not exactly understand why.
This need some research. Can you test version from custom branch ?
n := len(results) * (len(results[0].Name) + len(results[0].PathExpression) + 128*len(results[0].Values) + 128)
It's a try to get appoximated buffer size. Without this too many realloc. You can see benchmark in PR #729
May be need logic change or switch to direct write.
@msaf1980 which branch would you like us to test?
@msaf1980 which branch would you like us to test?
Tomorrow I write branch names. I'll adapt branch with direct write to current muster and may be a version with updated policy of buffer prealloc.
Switching doMultipleRequestsIfSplit: true to doMultipleRequestsIfSplit: false seemed to fix the query issues and also memory consumption, although we do not exactly understand why.
Hm, if this a true, json marshaling not a problem for high memory usage.
From documentation:
Only affects cases with maxBatchSize > 0. If set to `false` requests after split will be sent out one by one, otherwise in parallel
@Civil I don't use go-carbon on heavy load. doMultipleRequestsIfSplit: false
is recomemended for this ?
@Civil I don't use go-carbon on heavy load. doMultipleRequestsIfSplit: false is recomemended for this ?
It is niche. To have better results you need to have:
- multiple go-carbon servers, actually the more the better.
- server's I/O system should prefer concurrent multiple requests
- Individual requests (after split) should be small or medium.
As it mostly allows you to save on network and utilize potential concurrency of underlying storage.
If you have small amount of servers or slow I/O that is not very good with concurrent requests - I wouldn't recommend that as I would expect worse performance.
Potentially there is a room to implement heuristic to alternate between both, but that would require getting some information from go-carbon and much more performance data than I can gather myself.
And I would strongly advice against splitting globs or anything fancy if your backend is a database that can scale by itself (e.x. clickhouse)
doMultipleRequestsIfSplit: false
did not help in our single go-carbon and single carbonapi setup at all. When doing massive render request it starts eating memory up to somewhere 40-50Gb and then crash. Pod does not restart because memory limit was not reached. main will parse config as yaml {"config_file": "/etc/carbonapi/carbonapi.yaml"}
in logs after crash, so it did restarted itself.
Any settings what to try?
notFoundStatusCode: 404
cache:
type: "mem"
size_mb: 1024
defaultTimeoutSec: 600
backendCache:
type: "mem"
size_mb: 4096
defaultTimeoutSec: 10800
truncateTime:
"8760h": "1h"
"2160h": "10m"
"1h": "1m"
"0": "10s"
cpus: 6
concurency: 1000
combineMultipleTargetsInOne: true
idleConnections: 200
upstreams:
graphite09compat: false
buckets: 10
keepAliveInterval: "15s"
timeouts:
find: "300s"
render: "300s"
connect: "500ms"
backendsv2:
backends:
-
groupName: "go-carbon"
protocol: "carbonapi_v3_pb"
lbMethod: "all"
doMultipleRequestsIfSplit: true
maxTries: 3
maxBatchSize: 500
concurrencyLimit: 0
servers:
- "http://carbonserver:8080"
expireDelaySec: 600
unicodeRangeTables:
- "Latin"
- "Common"