mimir
mimir copied to clipboard
Creates a custom memcached client to reuse memory
The current memcached client always throw away the buffer which creates a lot of allocations.
It would be great to have a reusable buffer, although this might be difficult since there's a lot of cache interface used along the way.
Just to clarify, by current memcached client you refer to github.com/bradfitz/gomemcache
module or some internal interface?
I refer to this one yes. but it's hidden down below into mimir.
Just took a look and seems like the memcached
client connection already has a RW buffer which is being reused on every operation. Maybe be I'm overseeing something or the buffer drop happens elsewhere?
It's more items data sent back to users of the library take a profile of the store gateway, it should be one of the most alloc. The author left a todo if I remember correctly too.
gotcha! 👍
This one https://github.com/bradfitz/gomemcache/blob/a41fca850d0b6f392931a78cbae438803ea0b886/memcache/memcache.go#L509
Would be awesome to have a way to reuse them when the item is not needed anymore.
Would be awesome to have a way to reuse them when the item is not needed anymore.
Definitely agree. I think this is a point of improvement that could have a quite positive impact.
Maybe we can simply add a Close() error
method to Item
type to be explicitly called by the invoker. When called we put the buffer back into a shared pool. This has the downside of having to manually invoke this method after every Get
or GetMulti
operation, but in the worst case the GC will eventually reclaim the buffer instead of putting it back into the pool. Not too bad.
WDYT?
That's the spirit I think the hardest part is the usage.
If you go for it Prometheus code base as a pool that is bucketed and should work nicely. May be buckets needs to be configurable
When you get a chance, take a look at these changes:
https://github.com/ortuman/gomemcache/commit/8bb94a59999fc52836308966af8ec058d3d706f0
Also I've done some benchmarking and seen a slightly better result when reusing buffers. I guess the true value of this solution will be more noticeable in the long term, by minimizing the number of GC pauses..
➜ memcache git:(master) ✗ go test -bench=BenchmarkItemValuePool -benchtime=20s
goos: darwin
goarch: arm64
pkg: github.com/bradfitz/gomemcache/memcache
BenchmarkItemValuePool/benchmark-no-buffer-reuse-8 492558 45478 ns/op
BenchmarkItemValuePool/benchmark-buffer-reuse-8 549673 44162 ns/op
PASS
ok github.com/bradfitz/gomemcache/memcache 54.019s
The benchmark is wrong they both make the slice.
I think you should:
- use a https://github.com/prometheus/prometheus/blob/main/pkg/pool/pool.go
- allocate inside the Get and not outside. ( The one allocating should be the one who knows the size)
- The pool can be global to all client IMO.
The benchmark is wrong they both make the slice.
The first approach does not implicitly invoke Close
method, whereas the second one does. This causes no buffer object to be reused in the first benchmark case.
Anyway, I like the idea of using a bucket bufffer pool instead. This would help to reduce fragmentation, and thus reducing GC interventions.
Take a look at ortuman/gomemcache@71beaed139c365c02400465e32bf80614da6247a and tell me how you feel about it.
Also, here's a table of preliminary benchmark results:
goos: darwin
goarch: arm64
pkg: github.com/bradfitz/gomemcache/memcache
BenchmarkItemValuePool
BenchmarkItemValuePool/benchmark-no-buffer-reuse
BenchmarkItemValuePool/benchmark-no-buffer-reuse-8 22761 45378 ns/op
BenchmarkItemValuePool/benchmark-buffer-reuse
BenchmarkItemValuePool/benchmark-buffer-reuse-8 27661 39797 ns/op
PASS
It's better but I don't think you need a bytes.Buffer you can use the slice and re-sliced it to length required.
Oh! You're 100% right. Just changed it @ https://github.com/ortuman/gomemcache/commit/ab4be5bd29b01f1f7687db3b4333890a514ac6a2
Thanks @ortuman for working on this! Working on your personal repo is fine for experimenting, but if we'll end up vendoring this in Mimir then please fork gomemcached
under the grafana
organization and open a PR with your changes so we can do a proper review. Thanks!
Sure thing! As you mentioned I just wanted to experiment on my own repo before doing a formal fork. If this goes on I'll follow your suggestion. Thanks!
Note Mimir currently uses https://github.com/themihai/gomemcache fork, in order to add a circuit-breaker (only in query-frontend right now).
👋 Hey was passing by, while investigating a different issue on dev, I noticed on a profile that a lot of time is spent in the scanGetResponseLine
function, and it looks unoptimised in several different ways.
The most obvious one is that we could use
n, err := fmt.Fscanf(bytes.NewReader(line), pattern, dest...)
Instead of
n, err := fmt.Sscanf(string(line), pattern, dest...)
To avoid copying every single line just to scan it.
Apart from that, I'd consider implementing a custom parser for that line format instead of relying on fmt.Scanf
which seems to underperform here (I'm not familiar with the memcached response format, but do we really need to decode unicode here? It seems that a lot of time is spent just checking whether the next rune is a space)
Just for reference, it would be great to have the GATS command implemented too: https://github.com/grafana/mimir/pull/718
Some more data, was checking continous profiling on dev and realised that store-gateway spends 12% of its time just parsing the memcached response strings:
To avoid copying every single line just to scan it.
I wanted to check the size of this problem, so I grabbed another profile.
Total: 6.89TB
ROUTINE ======================== github.com/bradfitz/gomemcache/memcache.scanGetResponseLine in bradfitz/gomemcache/memcache/memcache.go
52.24GB 104.95GB (flat, cum) 1.49% of Total
. . 500: }
. . 501:}
. . 502:
. . 503:// scanGetResponseLine populates it and returns the declared size of the item.
. . 504:// It does not read the bytes of the item.
6.08GB 6.08GB 505:func scanGetResponseLine(line []byte, it *Item) (size int, err error) {
. . 506: pattern := "VALUE %s %d %d %d\r\n"
. . 507: dest := []interface{}{&it.Key, &it.Flags, &size, &it.casid}
. . 508: if bytes.Count(line, space) == 3 {
. . 509: pattern = "VALUE %s %d %d\r\n"
. . 510: dest = dest[:3]
. . 511: }
46.16GB 98.87GB 512: n, err := fmt.Sscanf(string(line), pattern, dest...)
. . 513: if err != nil || n != len(dest) {
512.01kB 512.01kB 514: return -1, fmt.Errorf("memcache: unexpected line in get response: %q", line)
. . 515: }
. . 516: return size, nil
. . 517:}
So copying the line is ~0.7% of all allocations; probably not the big win.
Agreed that a custom parser can perform better.
I coded scanGetResponseLine()
using IndexByte()
instead of SScanf()
.
https://github.com/bradfitz/gomemcache/compare/master...bboreham:faster-scanline
name old time/op new time/op delta
ScanGetResponseLine-4 4.06µs ± 1% 0.13µs ± 4% -96.89% (p=0.008 n=5+5)
name old alloc/op new alloc/op delta
ScanGetResponseLine-4 128B ± 0% 24B ± 0% -81.25% (p=0.008 n=5+5)
name old allocs/op new allocs/op delta
ScanGetResponseLine-4 7.00 ± 0% 1.00 ± 0% -85.71% (p=0.008 n=5+5)
(Note I still copy the line
to a string
, because we need a string to put in Item.Key
. I'm copying more than is strictly needed but it simplifies the code)
I coded scanGetResponseLine() using IndexByte() instead of SScanf().
Awesome!
Since we're using a fork, can you send a PR to the fork? However... The fork hasn't been updated in 4 years, so maybe it's just easier for you to bring in the changes required.
<fx: time passes...>
I have created https://github.com/grafana/gomemcache/ with the above change.