mesos-dns
mesos-dns copied to clipboard
Feature to limit number of Answers returned
Hello,
we use MesosDNS
for loadbalancing some of our traffic (e.g. between real loadbalancers) inside the Mesos/Marathon cluster. We run into issue that often one of the loadbalancers has much more traffic than others.
This is caused by getaddrinfo
system call behavior (forced by RFC3484) which sorts records got from MesosDNS
and returns always same record. From documenation:
There are several reasons why the linked list may have more than one addrinfo structure, including: the network host is multihomed, accessible over multiple protocols (e.g., both AF_INET and AF_INET6); or the same service is available from multiple socket types (one SOCK_STREAM address and another SOCK_DGRAM address, for example). Normally, the application should try using the addresses in the order in which they are returned. The sorting function used within getaddrinfo() is defined in RFC 3484; the order can be tweaked for a particular system by editing /etc/gai.conf (available since glibc 2.5).
It would be nice to have feature to restrict number of answers returned by MesosDNS
to one random record. Without it services started at the same time (which usually happens with Marathon when you restarts all of your application tasks at the same time) always use the same IP for other services they communicating with.
Thanks for reporting this. It looks like other systems have run into this nifty behavior as well. It's unclear to me if it's also possible to work around the problem by hacking /etc/gai.conf.
https://github.com/weaveworks/weave/issues/1245 https://github.com/hashicorp/consul/issues/1481
Comments suggest that the latest RFC fixes the problems with the sorting as per the spec but that getaddrinfo implementations have been slow to adopt the latest RFC.
https://tools.ietf.org/html/rfc6724 (the latest spec on record sorting)
Consul, in particular, implemented the workaround as suggested by the OP here. It would be useful to understand which Linux distributions are affected by this.
On Wed, Aug 2, 2017 at 5:28 AM, Mateusz Moneta [email protected] wrote:
Hello,
we use MesosDNS for loadbalancing some of our traffic (e.g. between real loadbalancers) inside the Mesos/Marathon cluster. We run into issue that often one of the loadbalancers has much more traffic than others. This is caused by getaddrinfo system call behavior which sorts records got from MesosDNS and returns always same record. It would be nice to have feature to restrict number of answers returned by MesosDNS to one random record. Without it services started at the same time (which usually happens with Marathon when you restarts all of your application tasks at the same time) always use the same IP for other services they communicating with.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mesosphere/mesos-dns/issues/507, or mute the thread https://github.com/notifications/unsubscribe-auth/ACPVLCf6LuqXfXHOEtH54U0IpMcf_tLgks5sUEFbgaJpZM4Oq1DI .
xref #485
@jdef any news on your side?
Not yet, stay tuned...
additional commentary re: libc implementations here http://www.zytrax.com/books/dns/ch9/rr.html