mesos-dns icon indicating copy to clipboard operation
mesos-dns copied to clipboard

Avoid query latency spikes on state reloads

Open tsenart opened this issue 10 years ago • 8 comments

The current locking strategy on state reloads will cause latency spikes. We can either reduce the locking scope or better, not use locks at all: http://golang.org/pkg/sync/atomic/#example_Value_config

EDIT: As discussed in #207, the current locking is actually very granular. Experimenting with the above is still worth it since it's a nice use case, but less important than previously thought.

tsenart avatar Aug 01 '15 10:08 tsenart

Have our Go version compat requirements changed? At one point we had to support go1.3

On Sat, Aug 1, 2015 at 6:36 AM, Tomás Senart [email protected] wrote:

The current locking strategy on state reloads will cause latency spikes. We can either reduce the locking scope or better, not use locks at all: http://golang.org/pkg/sync/atomic/#example_Value_config

— Reply to this email directly or view it on GitHub https://github.com/mesosphere/mesos-dns/issues/210.

jdef avatar Aug 01 '15 14:08 jdef

I didn't know about such restrictions. Why are they in place?

tsenart avatar Aug 01 '15 16:08 tsenart

/cc @kozyraki ?

jdef avatar Aug 01 '15 16:08 jdef

Ping @kozyraki.

tsenart avatar Aug 05 '15 11:08 tsenart

Isnt it simpler to just CAS the record maps? No locking just pointer logic and a bit of memory overhead (which shouldnt be a big deal in all but the biggest deployments)

sepiroth887 avatar Sep 26 '15 05:09 sepiroth887

@sepiroth887: No need for CAS, atomic load is enough, which is what's proposed here.

tsenart avatar Sep 28 '15 12:09 tsenart

Curious to see what the latency impact of atomic load is with say 1000-10000 tasks. we had pretty bad experience with overloaded bind servers in the past and applications expecting <2ms responses from a dns server.

It may not be great design on the applications part but I do value query consistency ;)

sepiroth887 avatar Sep 28 '15 14:09 sepiroth887

There should be little latency difference between an atomic store and CAS. For the amd64 architecture, Go compiles atomic.CompareAndSwaps to LOCK CMPXCHGQ and atomic.Stores to XCHGQ. The only difference is that in a store, we don't care about the value that was there before.

However, the suggested approach here was to use sync.Value which seems to be doing its own CAS loop internally: https://golang.org/src/sync/atomic/value.go?s=1242:1278#L34

tsenart avatar Sep 28 '15 15:09 tsenart