go-carbon icon indicating copy to clipboard operation
go-carbon copied to clipboard

[Q] Graphite cluster Go-carbon cache is full

Open expShailendra opened this issue 4 years ago • 10 comments

Graphite cluster Go-carbon cache is full, can anyone suggest how to flush the data from cache. I am not able to locate the file. Sorry very new to Graphite

expShailendra avatar Jan 07 '21 13:01 expShailendra

IIRC Flush happening when you shut go-carbon. I hope you have sparse-create = false, otherwise flush can make things even worse. Also, probably worth to set remove-empty-file = true, because files can be corrupted after full disk.

deniszh avatar Jan 07 '21 13:01 deniszh

We have sparse-create = true in our go-carbon.conf and no entry for remove-empty-file = true. Our go-carbon internal cache is touching the 1Million mark and we dont know how to flush the cache. Can somebody please guide.

rickyari avatar Jan 07 '21 14:01 rickyari

Hi @rickyari , Sorry for bad news, but full disk with 'sparse-create = true' potentially is not good - how go-caron suppose to flush if there's no space to flush? OTOH I'm not sure what you can do except "bite the bullet" - stop go-carbon, expand or clean up disk, set "emove-empty-file = true", start go-carbon and hope that you not lost too much data. I see no other way and it will not resolve by itself.

deniszh avatar Jan 07 '21 14:01 deniszh

@deniszh Not sure which disk to check but I know the partition for storing whisper files is not full in our environment. /dev/nvme0n1 1.8T 1.2T 587G 67% /mnt/array1

Infact no partition is 100% full on the servers

Filesystem Size Used Avail Use% Mounted on devtmpfs 30G 0 30G 0% /dev tmpfs 30G 4.0K 30G 1% /dev/shm tmpfs 30G 804K 30G 1% /run tmpfs 30G 0 30G 0% /sys/fs/cgroup /dev/xvda1 20G 6.1G 14G 31% / /dev/nvme0n1 1.8T 1.2T 587G 67% /mnt/array1 tmpfs 6.0G 0 6.0G 0% /run/user/1000

Can you please guide on how to proceed further.

rickyari avatar Jan 07 '21 14:01 rickyari

Hi @rickyari ,

Sorry, I misread "cache full" as "disk full". Mea culpa. 🤦

Then you need to describe what's your issue. Cache full is OK, cache should be full, that's a cache purpose. And maybe attach your go-carbon config too.

deniszh avatar Jan 07 '21 16:01 deniszh

Currently we are facing missing data points and slowness issues (slow Grafana dashboards )with our graphite cluster. I am not sure if I can post the go-carbon config of our cluster here but I can share a screenshot where the issue of missing data points is visible.

image

rickyari avatar Jan 07 '21 18:01 rickyari

Hi, looks like your go-carbon instance is receiving more than 1 million data points and can't keeping up writing cache to disk. You can try increasing it:

[cache]
# Limit of in-memory stored points (not metrics)
max-size = 5000000

The default is 1,000,000. So it seems to be the issue. Also you can try tuning the configs for [whisper] to see if you can speedup flushing cache to disk.

If necessary, you might have to consider increase the capacity of your go-carbon cluster.

bom-d-van avatar Jan 07 '21 19:01 bom-d-van

go-carbon.conf file is pasted below. Can you please suggest how to tune whisper section to flush cache faster apart from increasing the internal cache size.

[common]
user = "root"
graph-prefix = "go-carbon.agents.{host}"

# controlls GOMAXPROCS which itself controlls maximum number
# of actively executing threads, those which are blocked in systcalls
# are NOT part of this limit
max-cpu = 8
metric-interval = "1m0s"

[whisper]
data-dir = "/mnt/array1/graphite/whisper"
schemas-file = "/etc/go-carbon/whisper-schemas.conf"
aggregation-file = ""
workers = 8
max-updates-per-second = 0
sparse-create = true
enabled = true

[cache]
max-size = 1000000
write-strategy = "noop"

[pickle]
enabled = false

[tcp]
listen = ":2003"
enabled = true

[udp]
enabled = false

[carbonserver]
listen = ":8080"
enabled = true
buckets = 10
metrics-as-counters = false
read-timeout = "60s"
write-timeout = "60s"
query-cache-enabled = true
query-cache-size-mb = 0
find-cache-enabled = true
trigram-index = false
scan-frequency = "5m0s"
max-globs = 100
graphite-web-10-strict-mode = true
internal-stats-dir = ""

[carbonlink]
listen = "127.0.0.1:7002"
enabled = true
read-timeout = "30s"
query-timeout = "100ms"

[dump]
# Enable dump/restore function on USR2 signal
enabled = true
# Directory for store dump data. Should be writeable for carbon
path = "/mnt/array1"

[pprof]
listen = "localhost:7007"
enabled = false

# Default logger
[[logging]]
# logger name
# available loggers:
# * "" - default logger for all messages without configured special logger
# @TODO
logger = ""
# Log output: filename, "stderr", "stdout", "none", "" (same as "stderr")
file = "/var/log/go-carbon/go-carbon.log"
# Log level: "debug", "info", "warn", "error", "dpanic", "panic", and "fatal"
level = "error"
# Log format: "json", "console", "mixed"
encoding = "mixed"
# Log time format: "millis", "nanos", "epoch", "iso8601"
encoding-time = "iso8601"
# Log duration format: "seconds", "nanos", "string"
encoding-duration = "seconds"

rickyari avatar Jan 08 '21 04:01 rickyari

Apologies for some basic questions but I have recently started working on graphite.

Our log file for go-carbon is a data file. is it normal for a log file to be data file. If it is the case then is there a way to open and see the logs using any utility of graphite.

[ ~]$ file /var/log/go-carbon/go-carbon.log /var/log/go-carbon/go-carbon.log: data

rickyari avatar Jan 08 '21 10:01 rickyari

Maybe try increasing the workers in [whisper]. You have to check the usage and saturation level of cpu/memory/disk. If there are more memory and io bandwidth to spare, it should be ok to increase [whisper].workers and [cache].max-size.

[cache]
# Limit of in-memory stored points (not metrics)
max-size = 5000000

[whisper]
workers = 16

For logging, you can just use less and other tools to read it, isn't it a plaintext file?

bom-d-van avatar Jan 08 '21 20:01 bom-d-van