Ring clients affect CAS performances on Consul
I've spent the last working day benchmarking "ring on Consul" performances. To do it, I've built a tiny tool to run X lifecycles and Y clients (code) and run it from multiple machines/pods against a dedicated Consul server (I tested both Consul 1.5.3 and 1.10.3 getting comparable results).
I've done several tests, many of which inconclusive, so here I'm reporting the only one that looked to give consistent results across many different tests: ring clients (watching the ring key) affect CAS performances on Consul.
Scenario
- 100 lifecyclers heartbeating the ring
- 512 tokens per lifecycler
- Lifecyclers heartbeat period = 10s
- A variable number of clients watching the ring
- 60s run for each test (but I've run them many times getting comparable results over time)
- Lifecyclers and clients always running on different pods/machines
- Dedicated Consul server (none else using it)
See the code to check how timing was tracked.
0 clients
level=info msg=operations CAS()=505 datasize(bytes)=259370
level=info msg=consul.Get() avg=1.846022ms min=1.323458ms max=8.913187ms
level=info msg=consul.CAS() avg=9.077085ms min=6.96226ms max=25.035527ms
level=info msg=client.CAS() avg=24.952575ms min=20.455846ms max=45.214343ms retries=5 conflicts=5
1 client
level=info msg=operations CAS()=505 datasize(bytes)=259379
level=info msg=consul.Get() avg=1.844928ms min=1.323242ms max=4.322768ms
level=info msg=consul.CAS() avg=9.348299ms min=7.261846ms max=18.869644ms
level=info msg=client.CAS() avg=25.142215ms min=20.195086ms max=38.79353ms retries=5 conflicts=5
400 clients
level=info msg=operations CAS()=674 datasize(bytes)=259394
level=info msg=consul.Get() avg=11.81947ms min=1.180597ms max=642.563055ms
level=info msg=consul.CAS() avg=29.351135ms min=6.880116ms max=1.690589329s
level=info msg=client.CAS() avg=64.645022ms min=20.237888ms max=1.710288901s retries=174 conflicts=180
Summary:
- With 400 clients we have
client.CAS()avg going from 25ms to 65, but variance goes from 25ms to 1.7s. -
client.CAS()average timing is close the highest CAS QPS we can get in the best case scenario (eg. if avg is 65ms we can't successfully CAS more than 15 times / sec). - The longer
client.CAS()takes the more conflicts we have, but that's obvious
Experiment: introduce a slow down in clients
Hypothesis: clients are watching a key and then they get back an update (all at the same time). Then they all send another "get" request to Consul nearly at at the same time. We're DoS-ing Consul in short bursts happening every second (because of the rate limit we configure in our Consul client wrapper).
Experiment: I've tried to introduce a random delay after the "get" request returns in the WatchKey() to slow down subsequent requests to Consul (see commented code).
Result: could help a bit introducing few seconds delay, but not with lower values.
400 clients / with time.Sleep(dstime.DurationWithJitter(500*time.Millisecond, 1))
level=info msg=operations CAS()=884 datasize(bytes)=251748
level=info msg=consul.Get() avg=22.734101ms min=1.230174ms max=859.036878ms
level=info msg=consul.CAS() avg=69.33508ms min=7.465628ms max=1.54668955s
level=info msg=client.CAS() avg=123.077636ms min=20.635579ms max=2.303539771s retries=398 conflicts=419
400 clients / with time.Sleep(dstime.DurationWithJitter(2*time.Second, 1))
level=info msg=operations CAS()=521 datasize(bytes)=259338
level=info msg=consul.Get() avg=2.104879ms min=1.19876ms max=23.720245ms
level=info msg=consul.CAS() avg=11.232828ms min=7.334579ms max=322.944766ms
level=info msg=client.CAS() avg=29.64897ms min=20.379623ms max=402.913091ms retries=21 conflicts=21
Experiment: how ring size (bytes) affects performances
I've run the same benchmark but with 1 token per instance instead of 512. This reduce the ring size from about 250KB to less than 1KB. Performances are better, but watching clients affect CAS performances anyway.
Idea: introduce a proxy
Another idea I've got (but didn't try because of lack of time in my timeboxed test) is to introduce a caching proxy in front of Consul API. We would have the proxy speaking the Consul API, doing a pass-through of all requests to Consul except for "get" requests.
Get requests are de-multiplexed by the proxy: the proxy just watch once each key to Consul, keeps the latest value in memory and then serve the clients based on the (stale) in-memory copy of it (honoring the index version specified in the request).
Alternatives
In any case, heartbeating the ring on Consul can't scale much because the max QPS of successful CAS operations is given by the time it takes client.CAS() (which includes get the updated key from Consul + decode + update ring data structure + encode + call CAS on Consul). That's the main reason why we've built memberlist support.