crystal icon indicating copy to clipboard operation
crystal copied to clipboard

Async DNS resolution

Open z64 opened this issue 2 years ago • 13 comments

Currently, the socket runtime will call LibC.getaddrinfo, which will block the event loop until it completes.

For latency-sensitive applications, this can cause slowdowns that stack, or other issues. It is easy to write a python script that makes a bunch of HTTP requests to some hosts, and it will be much faster than Crystal, because it doesn't get stuck on DNS.

Since all other discussions I could find are quite old & closed, opening this new one for visibility & broader view of the problem's history.

Related material

  • https://github.com/crystal-lang/crystal/issues/2660
    • In a past era, Crystal used libevents getaddrinfo, but it was discarded for plain LibC getaddrinfo due to other issues. I don't know if anything has changed on libevent's end to consider trying it again.
  • dns.cr shard
    • Currently we use this for non-blocking DNS, and it works "ok". I do not recommend it though - unless you really need it - as it is far too complicated.
  • Theaded DNS resolver (from #4236)
    • To me, it seems like this could be a next step: Keep the robustness of libc getaddrinfo, but do it in another thread for the event loop is not blocked. This implementation could be brought up to date, or a new one made.

z64 avatar Jul 02 '23 22:07 z64

A great benefit of getaddrinfo is that it's battle tested and covers a lot of niches. It's used almost ubiquitously and its behaviour is considered as the system default (which it actually is on many systems).

Its main issue is that its blocking the current thread. We can alleviate that a bit by running it in a dedicated thread. This could happen implicitly and would be a nice feature in the context of the ongoing multi-threading refactor (ref https://github.com/crystal-lang/rfcs/pull/2). This is a well-known technique and should work relatively fine. However, it's quite inefficient.

So I think we'll eventually need a native implementation for DNS resolution that uses Crystal's concurrency. https://github.com/636f7374/dns.cr could be a good source for inspiration, but it's far too complex for this. We only need a relatively simple implementation, covering a fraction of its features.

We might take further inspiration from Go: https://pkg.go.dev/net#hdr-Name_Resolution It actually still uses getaddrinfo because the native implementation is incomplete. There's a rather complex set of rules to decide when to use the native implementation and when getaddrinfo. Considering that even Go hasn't managed to reach feature parity with getaddrinfo (I'm not even sure if that's a goal for them), maybe we should really keep our focus on making getaddrinfo usable as easily and efficiently as possible, and worry about a native implementation later.

That said, looking at the Go implementation it really shouldn't be that difficult to implement our own algorithm.

straight-shoota avatar Apr 23 '24 18:04 straight-shoota

It's impossible to reach feature parity with getaddrinfo (or equivalent). For example glibc/nsswitch allows extensible resolvers, so anybody can write, add and configure their own special resolver to handle special cases (e.g. custom domain, mDns, whatever you can think of).

Let us pause to contemplate the idea of using plugins at the libc level, instead of running a local resolver, that allows such plugins :face_exhaling:

Using a custom DNS resolver, be it libevent dns or a pure Crystal one, means bypassing all that customization. Even going through a custom DNS resolver then fallback to getaddrinfo is prone to errors, because the former may resolve while the latter is customized to return something else :sob:

Let's not even talk about security extensions (DNSSEC, DoT, DoH, DNSCrypt).

The dedicated thread is still likely a good idea. Even with an io/event aware DNS resolver, we'd still need the ability to call getaddrinfo. The advantage of Go is that it doesn't need a dedicated thread (it 'merely' detaches the scheduler from the thread) and there can be multiple concurrent calls to getaddrinfo.

ysbaddaden avatar Apr 25 '24 10:04 ysbaddaden

On Linux there's also getaddrinfo_a for asynchronous queries. It's a GNU extension and doesn't seem to be supported outside of glibc. I haven't looked too deeply into how it's implemented, but it could be a bit of a challenge to integrate it with the event loop. Also, since it only works with glibc, we'll need a generic solution for other targets anyway. So I don't think it's much worth persuing it.

straight-shoota avatar Jun 10 '24 13:06 straight-shoota

  • getaddrinfo_a notifies the caller via a POSIX signal or a callback in a new thread.
  • Darwin has something called getaddrinfo_async_start; it is used by Bun, other than that I couldn't find any documentation about it. Another system API is DNSServiceGetAddrInfo for macOS 10.12+.
  • On Windows 8 or above, the asynchronous Win32 API is DnsQueryEx or GetAddrInfoExW. Both support asycnhronous callbacks, the latter overlapped I/O also.

I don't think we really have to stick to using getaddrinfo exactly, as long as other system APIs also provide feature parity with respect to those customizations.

HertzDevil avatar Sep 01 '24 03:09 HertzDevil

building a crystal native implementation sounds like the easiest way to provide cross platform non-blocking support. I like the idea of being able to switch out the implementation too - making it easier to add support for mDNS or similar to 3rd party libraries that one might be using in an application (or maybe a way to switch implementation on matching regex .local for instance)

chatgpt pumped out a working (albeit basic) crystal implementation in 3-shots so I can't imagine it would be too hard to replicate what Go does and would probably be less code to maintain than implementing async system call implementations for each platform

stakach avatar Sep 02 '24 23:09 stakach