mesos-dns
mesos-dns copied to clipboard
Avoid query latency spikes on state reloads
The current locking strategy on state reloads will cause latency spikes. We can either reduce the locking scope or better, not use locks at all: http://golang.org/pkg/sync/atomic/#example_Value_config
EDIT: As discussed in #207, the current locking is actually very granular. Experimenting with the above is still worth it since it's a nice use case, but less important than previously thought.
Have our Go version compat requirements changed? At one point we had to support go1.3
On Sat, Aug 1, 2015 at 6:36 AM, Tomás Senart [email protected] wrote:
The current locking strategy on state reloads will cause latency spikes. We can either reduce the locking scope or better, not use locks at all: http://golang.org/pkg/sync/atomic/#example_Value_config
— Reply to this email directly or view it on GitHub https://github.com/mesosphere/mesos-dns/issues/210.
I didn't know about such restrictions. Why are they in place?
/cc @kozyraki ?
Ping @kozyraki.
Isnt it simpler to just CAS the record maps? No locking just pointer logic and a bit of memory overhead (which shouldnt be a big deal in all but the biggest deployments)
@sepiroth887: No need for CAS, atomic load is enough, which is what's proposed here.
Curious to see what the latency impact of atomic load is with say 1000-10000 tasks. we had pretty bad experience with overloaded bind servers in the past and applications expecting <2ms responses from a dns server.
It may not be great design on the applications part but I do value query consistency ;)
There should be little latency difference between an atomic store and CAS. For the amd64 architecture, Go compiles atomic.CompareAndSwaps to LOCK CMPXCHGQ and atomic.Stores to XCHGQ. The only difference is that in a store, we don't care about the value that was there before.
However, the suggested approach here was to use sync.Value which seems to be doing its own CAS loop internally: https://golang.org/src/sync/atomic/value.go?s=1242:1278#L34