alternator-load-balancing icon indicating copy to clipboard operation
alternator-load-balancing copied to clipboard

[dns]: test if dns lb actually works

Open dkropachev opened this issue 8 months ago • 3 comments

Run dns lb on all supported languages and sdk versions to make sure that it works as intended. Document findings in dns/README.md

dkropachev avatar Mar 17 '25 13:03 dkropachev

To be done after https://github.com/scylladb/alternator-load-balancing/issues/14

dkropachev avatar Mar 17 '25 13:03 dkropachev

It's easy to test that the DNS server is working by doing "dig" to it, for example dig @localhost somename.com, but as you noted it's more difficult to confirm that it "actually" works "as intended": Confirm that when the SDK uses this DNS through several layers - Amazon's SDK code, Python's HTTP, URL and socket libraries, and Linux's glibc and resolver, after going through all this it actually does what we hope it achives:

  • that the low TTL is honored.
  • how multiple A response-records are handled (as #14 adds)
  • how multiple threads or processes on the same machine cache or don't cache the DNS reponse.

One of the specific things I want to verify in this issue is that a DNS that returns all nodes (such as dns-loadbalancer-rr.py added for #14) has any advantages over a simpler one that just returns one node (such as dns-loadbalancer.py). Specifically when DNS responses might be cached in one of many layers an SDK uses (name server, operating system, C library, high-level language library, HTTP library, AWS SDK, etc.), I want to see whether the one-node-returning DNS is more vulnerable to caching (where different connections, processes or even client machines, all use the same the same Scylla node) than a server returning the list of all live nodes.

Importantly, it's not OK to override the DNS used by a test by monkey-patching SDK code because we might monkey-patch the wrong code in the wrong layer. Rather, we need to force the SDK to use our DNS server using the established operating-system way to choose a DNS server for the entire test application. This is normally /etc/resolv.conf - but it will be really sad to need this test to mess around with the real /etc/resolv.conf. Maybe we need to run the test in a chroot jail or container or something, but perhaps a cleaner way is to use bind mount to shadow only the /etc/resolv.conf file and nothing else.

nyh avatar Mar 17 '25 14:03 nyh

we are using docker --dns flag exactly like that in SCT: https://github.com/scylladb/scylla-cluster-tests/blob/052e07b4b188e2998ed933d5f4c3faabb76a1b57/sdcm/ycsb_thread.py#L266

and the java cache disablement: https://github.com/scylladb/scylla-cluster-tests/blame/052e07b4b188e2998ed933d5f4c3faabb76a1b57/docker/ycsb/Dockerfile#L30

fruch avatar Mar 31 '25 22:03 fruch