go-carbon icon indicating copy to clipboard operation
go-carbon copied to clipboard

[Q] What to fine-tune to decrease disk workload?

Open ritmas opened this issue 4 years ago • 19 comments

I'm using latest go-graphite v0.15.5 and recently have both enabled trie-index and switched CPU to GCP N2 family.

From the first glance, it looks like server workload (CPU, memory) has decreased almost twice, traffic has not changed, but the bottleneck became SSD disk in terms of write IOPS.

What should I fine-tune to decrease disk workload? Could it be related to stats retentions misconfig?

Some details:

Machine: n2-highcpu-16 (16 vCPUs, 16 GB memory), 1TB SSD disk
At peak time go-carbon handles ~16 qps
On average only 0.4 metrics per second are created.

Here is the diff between my config and go-carbon.conf.example:

[common]
max-cpu = 16

[cache]
max-size = 10000000
write-strategy = "noop"

[carbonserver]
enabled = true
query-cache-enabled = false
find-cache-enabled = false
trie-index = true
file-list-cache = ""
concurrent-index = true
realtime-index = 100

[[logging]]
level = "warn"

[[logging]]
level = "warn"

This is full storage-schemas.conf content:

[carbon]
pattern = ^carbon\.
retentions = 5m:90d
compressed = false

[stats]
pattern = ^stats.*
retentions = 10s:1h,60s:1d,10m:30d,1h:90d,24h:1y
compressed = false

This is full storage-aggregation.conf content:

[min]
pattern = \.lower$
xFilesFactor = 0.1
aggregationMethod = min

[max]
pattern = \.upper(_\d+)?$
xFilesFactor = 0.1
aggregationMethod = max

[sum]
pattern = \.sum$
xFilesFactor = 0
aggregationMethod = sum

[count]
pattern = \.count$
xFilesFactor = 0
aggregationMethod = sum

[count_legacy]
pattern = ^stats_counts.*
xFilesFactor = 0
aggregationMethod = sum

[default_average]
pattern = .*
xFilesFactor = 0.3
aggregationMethod = average

Some system-wise graphs: cpu load ram udp-errors network io

Few graphite-wise graphs: carbon-metrics-rec-com carbon-metrics-upd-cre

ritmas avatar Jan 21 '21 15:01 ritmas

Hi @ritmas ,

Theoretically, trie-index should no affect write performance, and change looks quite dramatic. Maybe something else was also change? Also, please note that read querying also not relevant to amount of writes (vice versa is not true, but that's a different story). Only amount of new metric and existing metrics is relevant to wrie load. So, default setup has no limit on disk load:

[whisper]
# Limits the number of whisper update_many() calls per second. 0 - no limit
max-updates-per-second = 0
# Softly limits the number of whisper files that get created each second. 0 - no limit
max-creates-per-second = 0

You can try to limit disk usage and increase cache. E.g.

[cache]
max-size = 100000000

[whisper]
max-updates-per-second = 200000
max-creates-per-second = 100

Check e.g. https://cloud.google.com/compute/docs/disks/performance for iops limit, but also please note that this update is not IOPS but Update operations from graphs above.

deniszh avatar Jan 21 '21 16:01 deniszh

Yeah, as @deniszh said, there wasn't intended write path changes for standard whisper file. And it looks like the io usage increased is due to faster updates and creates of metrics.

But it might arguably be a good thing, as your data now hit disk faster. Are you checking the cache.queueWriteoutTime metrics? Hopefully it should be lower as a result.

(also nice to see memory and cpu usage lowered with trie stuff, that would keep us motivated, thanks)

bom-d-van avatar Jan 21 '21 17:01 bom-d-van

Hmm, am I misunderstanding something. The disk io requests is lower or largely unchanged in your screenshot?

bom-d-van avatar Jan 21 '21 17:01 bom-d-van

@deniszh as fas I remember I tried max-updates-per-second some time ago with any noticeable changes, but I will give it a shot again with your suggested config and will leave it for at least 24 hours.

I'm aware of GCP disk IOPS limit dependent on CPU quantity, that's why I'm interested in trying to decrease disk workload with go-carbon fine-tuning (if any available) in first place. Last resort will be increasing VM resources.

@bom-d-van yes, indeed, cache.queueWriteoutTime has decreased generally speaking. It has increased/spiked again lately, but it might be related to new metrics (tests, natural traffic), not sure to be honest. carbon-write-timeout

Regarding IO quantity, on average it has decreased a bit. Before enabling trie-index it was roughly about 17k and now about 15k. But major part of it has shifted towards write operations. As you have pointed it out with decreased cache write timeout, it does make sense that write operations have increased a bit. Adding bigger graph: io2

ritmas avatar Jan 21 '21 18:01 ritmas

Just speculation, maybe the reduced memory usage of go-carbon was repurposed by linux page cache, thus it leads to less disk reads and faster/higher disk writes. In your graph, the cache memory does increased.

I get your io graph now. I didn't notice that the write io metric was negative values.

bom-d-van avatar Jan 21 '21 18:01 bom-d-van

@deniszh, it seems SSD IO/reqs rate went back to its previous state (or even more) - which is good. Now, can we make it to the "awesome"? :)

io-bytes io-requests carbon-metrics-upd-cre2

What I don't still understand fully is how to calculate cache max-size:

[cache]
# Limit of in-memory stored points (not metrics)
max-size = 100000000

Those config changes you've suggested did not increased memory usage, so it seems it's possible to increase max-size even more. But where is the limit? What formula to apply?

memory

By the way, I wanted to denote that I'm using v0.15.5 which is specific tag, not the master branch. I saw few fixes recently, but not sure whether they could help me in this case?

ritmas avatar Jan 24 '21 10:01 ritmas

Hi @ritmas,

I do not see many fixes, just one, which I just merged into master. I'm going to release 0.15.6, but I do not think it will affect issue above.

What I don't still understand fully is how to calculate cache max-size:

That's a hard part, indeed. I'm afraid it can be calculated only in empirical way, i.e. with trial-and-error. Math behind it is not complicated - it's number of datapoints in cache. If you know your interval and number of metrics you can calculate how much datapoints it will keep in cache until your disk will flush it to disk. I.e. You have 1M metrics/min coming, resolution is 10 seconds - then 10M cache will hold approximately 10 000 000 / 60 / 10 = 1.5 second of incoming flow, which is not much. (If your resolution is 1minute then 10M cache give you 10 seconds). You can safely increase it 10x - if you have enough RAM. Check how much memory go-carbon process consumes and assume that whole that memory is metric cache (which is not true ofc, only for approximation purposes).

OTOH it's very hard to say how much gain of IOPS you will get from that cache increase - that's why some empyrical test needed.

Those config changes you've suggested did not increased memory usage

You need to look how much memory go-carbon process itself consumes. I think majority of "cached" is hot set of whisper files from linux file cache, as @bom-d-van suggested.

deniszh avatar Jan 24 '21 11:01 deniszh

I do not see many fixes, just one, which I just merged into master.

Yeah, I meant changes in general I guess.

You can safely increase it 10x - if you have enough RAM. Check how much memory go-carbon process consumes <...>

Based on RSS part (if this is correct one), go-carbon uses about 4gb of memory, so I'll go with x5 firstly:

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root     23789  102 25.1 6962628 4087748 ?     Dsl  Jan21 4098:35 /data/go-carbon/go-carbon -config /etc/go-carbon/go-carbon.conf

In addition to this, @deniszh could you give more info on max-updates-per-second also?

# Limits the number of whisper update_many() calls per second. 0 - no limit
max-updates-per-second = 200000

How does the internals work? Are these changes queued in the memory for specific period of time? If this value is decreased down x10, should one expect less write operations to the disk?

ritmas avatar Jan 24 '21 12:01 ritmas

Based on RSS part (if this is correct one), go-carbon uses about 4gb of memory, so I'll go with x5 firstly

Yes, 5 times looks better, it's better to increwase it gracefully.

How does the internals work? Are these changes queued in the memory for specific period of time? If this value is decreased down x10, should one expect less write operations to the disk?

Long story short - it's complicated. I just took 200K from your graph above, which said that limit is 500K, it's just a number of calls to store() function which stores data, and it's controlled by throttler. Also, looks like limit is per worker, so, 200K probably have no effect, for 8 workers it should be around 500K / 8 ~ 60000 (?) I mean looks like effect above was only because increasing cache size from 1M to 10M. If your writer will be throttled - then your cache will increase and with such small cache vs incoming flow (10M points vs 60M points/min) can overflow cache very fast. So, that part also need to be carefully tuned. Usually, people just putting IO subsystem limit there and play with cache size.

deniszh avatar Jan 24 '21 14:01 deniszh

I made few changes/tests lately, but cannot see any change in terms of disk IO nor CPU/memory. Also double-checked with GCP graphs. @deniszh what am I doing wrong?

2021-01-25 09:00

[whisper]
max-updates-per-second = 60000
[cache]
max-size = 500000000

2021-01-26 10:30

[whisper]
max-updates-per-second = 30000

2021-01-27 9:30

[whisper]
max-updates-per-second = 10000

2021-01-28 12:50

[whisper]
max-updates-per-second = 5000

io-requests2

ritmas avatar Jan 29 '21 07:01 ritmas

Was there any changes in your grapph "update operations vs creates & cpu", for go-carbon?

bom-d-van avatar Jan 29 '21 11:01 bom-d-van

@bom-d-van there is nothing significant which would catch the eye

carbon-metrics-upd-cre3

ritmas avatar Jan 29 '21 11:01 ritmas

Is the blue line "Update Operations"? It does lower significantly on 29 (from 600k to 300k) when you lowered it 5000. You can try pushing it further, but do keep an eye on the cache usage, cache.queueWriteoutTime and avoid unnecessary drops.

bom-d-van avatar Jan 29 '21 12:01 bom-d-van

Yes @bom-d-van it is and it looks throttled, but not the rate at I was expected. As @deniszh mentioned the limit suppose to be per worker so I was expected ~40k limit: 8 workers * 5000 max-updates-per-second = 40k

So I pushed it a bit further:

2021-01-29 14:10

[whisper]
max-updates-per-second = 1000

2021-01-29 15:25

[whisper]
max-updates-per-second = 500

io-requests3 carbon-metrics-upd-cre4

So from the point of graphs throttling works (formula is inaccurate though), but actual disk utilization is hitting its limits.

# iostat -xd 1 /dev/sdb
Linux 3.10.0-1160.11.1.el7.x86_64 (<redacted>) 	02/23/2021 	_x86_64_	(16 CPU)

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdb              60.00     0.00 1386.00 23284.00 40824.00 104164.00    11.75   144.81    5.87    5.67    5.88   0.04  99.90

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdb              53.00     0.00 1338.00 23342.00 38964.00 106224.00    11.77   146.78    5.95    5.80    5.95   0.04 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdb              50.00     0.00 1321.00 23358.00 39328.00 106396.00    11.81   143.08    5.79    5.59    5.81   0.04  99.50

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdb             143.00  1187.00 3487.00 18095.00 99444.00 217952.00    29.41    95.46    4.45    1.91    4.94   0.05  99.20

GCP graphs show lower mean/average IOPS quantity, but writing peaks/spikes remain constant.

gcp-disk-iops

What else can I try?

ritmas avatar Feb 23 '21 07:02 ritmas

I think you can also try lower down max-creates-per-second, it looks like sometimes it peaks at 500, depends on the retention policy, if it's 1MB per whisper file, it could be 500MB.

Just in case, with lower threshold for flushing cache data to your server, it means the in-memory cache is bigger and you want to pay attention to cache.queueWriteoutTime, cache.overflow and cache.size from go-carbon.


Other alternative could be:

  • scaling your go-carbon cluster horizontally, by adding a few more machines to it
  • scaling your go-carbon instance vertically, by moving to machine with more powerful disk
  • you can also try migrate to compressed whisper

Compressed whisper has much better disk performance due to compression and less io, but you would lose the capability of out-of-order update and history rewrite, if it isn't issue for your cluster, I would recommend you trying it out.

bom-d-van avatar Feb 23 '21 07:02 bom-d-van

@ritmas

What else can I try?

TBH I still not really getting what you trying to achieve and why high IOPS are bad for you, if everything works fine. But if it's really an issue you can try to migrate to carbon-clickhouse / graphite-clickhouse setup. It will give you much lesser iops, but you'll need to setup and manage Clickhouse. See TLDR repo if interested - https://github.com/lomik/graphite-clickhouse-tldr

deniszh avatar Feb 23 '21 10:02 deniszh

@bom-d-van is compressed whisper applicable on existing metrics/data or does it need to be enabled on brand new setup? Also, is this feature reversible?

@deniszh the initial assumption was that my go-carbon setup is under huge workload and not all the data/traffic is being handled properly as the disk seems overutilized. So I reached you out on configuration options. Increasing go-carbon cache max-size and whisper max-updates-per-second helped to lower write IOPS, but this throttling thing is still a mystery to me as actual iostat still shows higher IO rate than expected.

Anyway, if cache.overflow shows zero value it seems I'm not loosing any metrics (#402) due to high IOPS rate nor cache size limit after all.

PS - clickhouse looks interesting

ritmas avatar Feb 25 '21 08:02 ritmas

is compressed whisper applicable on existing metrics/data or does it need to be enabled on brand new setup?

Migration is needed. You have two approaches:

  1. Use the convert program in go-whisper to migrate existing whisper files to compressed format
  2. Create a new go-carbon cluster with whisper.compressed = true and use buckytools to sync data from the uncompressed cluster

Option 2 is probably better, in cases things go wrong, you won't lose any data.

Option 1 is good if you already two or more clusters, also, initial compression ratio is better using convert program.

You can enable compression for the whole go-carbon instance/cluster by default with:

[whisper]
compressed = true

You can also enable compression with certain types of metrics use pattern matching: https://github.com/go-graphite/go-carbon/blob/master/deploy/storage-schemas.conf#L9

Also, is this feature reversible?

Not really. So it's better to test it out first. Duplicate your data in two clusters, enable one with compression before you decide if it's feature that works better for you.

bom-d-van avatar Feb 25 '21 11:02 bom-d-van

Quick update on iostat and especially %util part if it's relevant for anyone - one cannot be trusted when it comes to SSD/NVMe.

%util Percentage of elapsed time during which I/O requests were issued to the device (bandwidth utilization for the device). Device saturation occurs when this value is close to 100% for devices serving requests serially. But for devices serving requests in parallel, such as RAID arrays and modern SSDs, this number does not reflect their performance limits.

Few resources on this topic:

Actual disk limitations should be calculated in different way. For the time being I personally rely on tps | kB_read/s | kB_wrtn/s values which are IOPS and throughput accordingly.

Based on GCP disk performance calculations, limitations of zonal SSD of 1000gb in size (N2 CPU) are 25k IOPS and 1200 MB/s which is higher than actual usage:

# iostat -d sdb 1 2
Linux 3.10.0-1160.11.1.el7.x86_64 (<redacted>) 	08/06/2021 	_x86_64_	(16 CPU)

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sdb            5784.11     81305.53     15128.40 1158379925681 215538064508

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sdb            5075.00     73492.00      7712.00      73492       7712

ritmas avatar Aug 06 '21 12:08 ritmas