Asynchronous DNS client
Currently, asyncio calls the standard getaddrinfo() function in a thread
because this function is blocking. There is no standard asynchronous functio to
resolve an host name. It would be interesting to try to plug one of the
following asynchronous DNS client into asyncio:
http://www.gnu.org/software/adns/ => http://code.google.com/p/adns-python/
http://c-ares.haxx.se/ => https://pypi.python.org/pypi/pycares
https://github.com/getdnsapi/getdns
http://adns.sourceforge.net/
Old project:
http://pydns.sourceforge.net/ (no update since 11 years? blocking?)
Original issue reported on code.google.com by [email protected] on 6 Mar 2014 at 4:47
Such a dependency would be a no-no for the stdlib. Or what else are you
proposing?
Original comment by [email protected] on 6 Mar 2014 at 4:50
I'd recommend a resolver plugin mechanism like Tornado uses:
http://www.tornadoweb.org/en/stable/netutil.html#tornado.netutil.Resolver
Then a third party could introduce a compatibility layer between PyCares (or
whatever) and asyncio. Users could then configure asyncio at runtime to use
PyCares, via the standard interface.
Original comment by [email protected] on 6 Mar 2014 at 5:00
> Such a dependency would be a no-no for the stdlib. Or what else are you
proposing?
I propose to add an API to plug a DNS resolver. Not to use another DNS resolver.
Original comment by [email protected] on 6 Mar 2014 at 5:02
See also issue #160 (Add an optional cache for loop.getaddrinfo()).
Original comment by [email protected] on 6 Mar 2014 at 5:03
> I propose to add an API to plug a DNS resolver. Not to use another DNS
resolver.
Sorry, I didn't get that. Such an API sounds good.
Original comment by [email protected] on 6 Mar 2014 at 5:14
Here is a first try, without test yet:
https://codereview.appspot.com/72270043
The API is a little bit surprising: get_resolver() returns an instance, whereas
set_resolver() expects a class: it instanciates the class and attach it to the
event loop.
Example of external resolver using pycares:
https://bitbucket.org/haypo/asyncio_staging/src/tip/resolver_cares.py
Original comment by [email protected] on 7 Mar 2014 at 2:26
When "DNS resolver" was mentioned I thought we'd try to go beyond
getaddrinfo/getnameinfo, I hope we can do that! :-)
Using getaddrinfo is OK if all you are doing is the usual stuff, but if you
want to do a simple XMPP bot you need SRV support. It's probably ok for Tornado
because it's a web framework, but asyncio is a general purpose framework, so
IMHO it would be nice if we can do more DNS.
Also, for some, getaddrinfo is "considered harmful":
http://daniel.haxx.se/blog/2012/01/03/getaddrinfo-with-round-robin-dns-and-happy
-eyeballs/ so having the ability to do A or AAAA queries manually would come
handy. Also, for happy eyeballs, one could do parallel queries for A and AAAA
and then attempt to connect as results some.
I personally have only used pycares (:-P) and dnspython (which is blocking). Do
you know of any other async DNS resolver? Maybe we can come up with some API to
accomodate a more general purpose DNS resolver. I recently found getdns, Python
bindings seem to be in the works, so that will also be interesting.
Original comment by saghul on 7 Mar 2014 at 8:02
"asyncio is a general purpose framework, so IMHO it would be nice if we can do
more DNS."
asyncio should work without these external dependencies, only with modules
available in the Python stdlib. That's why I chose to only make getaddrinfo()
and getnameinfo() configurable.
If you call set_resolver() in your application, you can also use get_resolver()
to call arbitrary methods on your resolver which would not be part of the
public API of the Resolver. You can put the glue between the dns library and
asyncion in this class.
"for happy eyeballs, one could do parallel queries for A and AAAA and then
attempt to connect as results some."
See issue #86: "Implement "Happy Eyeballs" (RFC 6555) for dual-stack
create_connection()".
I don't think that DNS resolution is the slowest path when trying to connect in
IPV4 and IPv6 at the same time. My ISP provides IPv6 (Free in France, using
"6to4rd" technology) but only provides IPv4 DNS.
I don't know the DNS protocol enough: is it possible to ask for A and AAAA
records in the same request?
"Do you know of any other async DNS resolver?"
See my first message of this issue: cares, adns, getdns, tadns.
Original comment by [email protected] on 7 Mar 2014 at 10:28
Updated patch, now with a cache:
https://codereview.appspot.com/72270043/#ps20001
I chose to expose loop.resolver_cache as an attribute instead of adding many
methods related to the cache (loop.configure_resolver_cache,
loop.clear_resolver_cache, etc.).
I also improved and fixed pycares resolver:
https://bitbucket.org/haypo/asyncio_staging/src/tip/resolver_cares.py
If you are not convinced that a DNS cache is needed, here are timeings of 3
runs of "python3 examples/crawl.py http://www.xkcd.com -q":
1) with cache (20 entries/60 sec): Finished 2732 urls in 20.440 secs
2) without cache (20 entries/60 sec): Finished 2732 urls in 427.866 secs
3) with cache (20 entries/60 sec): Finished 2732 urls in 21.598 secs
During the run without cache, I saw many lines like this:
...
INFO:asyncio:poll took 1.003 seconds
INFO:asyncio:poll took 3.105 seconds
INFO:asyncio:poll took 1.239 seconds
INFO:asyncio:poll took 4.150 seconds
INFO:asyncio:poll took 3.182 seconds
INFO:asyncio:poll took 1.008 seconds
INFO:asyncio:poll took 1.098 seconds
INFO:asyncio:poll took 1.206 seconds
INFO:asyncio:poll took 1.025 seconds
...
Is something wrong in my Fedora 20 setup? Or is my ISP DNS so slow? (DNS
servers: 212.27.40.240 and 212.27.40.241)
Original comment by [email protected] on 7 Mar 2014 at 6:07
"asyncio should work without these external dependencies, only with modules
available in the Python stdlib. That's why I chose to only make getaddrinfo()
and getnameinfo() configurable."
Right, I forgot about that one, sorry.
Original comment by saghul on 10 Mar 2014 at 7:51
"I don't know the DNS protocol enough: is it possible to ask for A and AAAA
records in the same request?"
If you do a 'normal' DNS query no, AFAIK, but if you use getaddrinfo and pass
AF_UNSPEC (the default) as the family, you'll get both results. Then I guess
you'd group the results by family and try to connect at the same time, then
continue with the algorithm if IPv6 failed, and so on.
Original comment by saghul on 10 Mar 2014 at 8:00
The description of the DNS protocol is pretty obscure, but it looks like you
can send it multiple questions in a single UDP packet and it'll return multiple
responses. Maybe getaddrinfo() even uses this -- but the implementation has
many layers of abstraction and caching and varies from system to system (even
when using the same OS, since a lot is configurable).
Building your own from scratch will lose the benefits of the system
implementation (which presumably include sharing between different processes or
even users). But of course it may also address some of the downsides (like the
lack of control over what the system resolver does).
Your description of trying to connect using different families reminds me of
the "happy eyeballs" algorithm (which is an actual RFC). We have a poor
approximation of this currently in create_connection(); independently from the
DNS interface it would be nice if we had a full implementation, and that's
being proposed in issue 86.
Original comment by [email protected] on 10 Mar 2014 at 4:48
Oh, Saul released a library for asynchronous DNS name resolution:
https://github.com/saghul/aiodns
His pycares binding now also works with asyncio:
https://github.com/saghul/pycares/releases/tag/pycares-0.6.0
Original comment by [email protected] on 27 Mar 2014 at 8:04
If we modify the DNS APIs, we should maybe prepare the APIs for the happy
eyeball use case:
http://code.google.com/p/tulip/issues/detail?id=86
Original comment by [email protected] on 27 Mar 2014 at 10:55
Proof-of-concept of DNS client to get the IPv4 of a domain (A record):
https://bitbucket.org/haypo/asyncio_staging/src/tip/dns.py
It's only a PoC. It has many bugs and the most simple DNS support possible :-)
A DNS client written for asyncio in pure Python gives more control on the name
resolution:
* fully asynchronous
* full control on timeout
* we may support other DNS records than A and AAAA: MX for SMTP and SRV for
XMPP (Jabber)
If we take this approach, it must be optional. It's probably safer to rely on
the OS by default. Supporting DNSSEC is not trivial, I'm not sure that Python
supports all required ciphers. The Python ssl module is currently designed to
wrap a socket, not to give access to ciphers directly.
As Saul said, such DNS client would use its own client. It might be shared
between different processes, but only processes using this client.
Maybe such DNS client should be a library, and asyncio should just provide an
DNS to plug an "external" DNS resolver?
Original comment by [email protected] on 5 Feb 2015 at 2:12
+1 for pure Python DNS implementation
Theorically, all protocols (HTTP, SSH...) should be outside of AsyncIO, but DNS
is special, because you need that before launch connection with a domain name.
The problem to integrate DNS full monty in AsyncIO is that if we have an issue
or missing feature, it will harder for AsyncIO users to upgrade compare to an
external library.
Maybe you could have the same approach that in aiohttp: By default, they use
their internal classes/objects that you can override during instantiation.
Example: https://github.com/KeepSafe/aiohttp/blob/master/aiohttp/web.py#L1606
BTW, about missing features, NAPTR is also an important type of record for SIP.
Almost SIP endpoints use that to simplify configuration: Instead of hard-code
in the phone the list of servers, transports and ports, phones do an NAPTR
request to retrieve that on the fly.
Original comment by [email protected] on 8 Feb 2015 at 8:16
I think I have said it before, but I feel that it is wrong to talk directly to
a DNS server instead of using the platform's lookup service. (It's fine to
have a DNS client, but it's not okay to use that by default for the event
loop's getaddrinfo() method.)
Original comment by [email protected] on 9 Feb 2015 at 2:19
I just naively tried to use aiodns because I didn't know that asyncio
implemented getaddrinfo. To my surprise, it failed to resolve machines on my
local network using Zeroconf (multicast DNS), because it didn't use my system's
lookup service (nsswitch).
Then I found this thread, switched to getaddrinfo and all is well.
So please note this downside of not using the platform's lookup service by
default: it won't work with Zeroconf unless that is implemented separately.
Original comment by [email protected] on 2 Mar 2015 at 5:08
I want to resurrect this discussion. It would be really cool if we can have an API for setting custom DNS resolvers in asyncio in 3.5.
Writing a pure Python DNS client is a great idea, but it's not an easy feat. This can be done separately from asyncio if it has the API to plug such implementation in.
Well, a DNS client is not what we need. We'd need a DNS client plus a cache plus a way to consult the system resolver plus who knows what else... Do you have an actual use case and a suitable implementation, or is this just a completeness dream?
I kind of liked Victor's patch: https://codereview.appspot.com/72270043/patch/20001/30001
Specifically, loop.set_resolver() method. Isn't cache something that resolvers can implement themselves?
So the actual use case it to: 1) try to integrate c-ares, 2) try to implement a cache policy for it. OTOH, this experimentation is something that is possible to do with subclassing.
hi,
blindly calling getaddrinfo(3) is a really bad design choice, and is considered a bad pattern in C where it originates from. at the very least, a better approach would be to call getaddrinfo(3) when DNS is actually required (i.e. inet_pton(3) fails).
i think that adding an API to override this behaviour would be even better, consider that a +1000 from me, because it's really the only bad aspect of asyncio i've encountered.
@kaniini
blindly calling getaddrinfo(3) is a really bad design choice, and is considered a bad pattern in C where it originates from. at the very least, a better approach would be to call getaddrinfo(3) when DNS is actually required (i.e. inet_pton(3) fails).
That's what we already do, see https://github.com/python/asyncio/blob/master/asyncio/base_events.py#L90
it is inconsistent, https://github.com/python/asyncio/blob/master/asyncio/base_events.py#L578-L593 - these functions are called by other dependencies such as asyncio.open_connection(). it would be desirable to ensure that there is consistency in what is done, i think. shipping off thousands of getaddrinfo(3) calls to a thread pool when the code is connecting to a well-defined IP address results in significant CPU overhead.
I'm sorry, what you're saying doesn't make any sense to me.
Current loop.getaddrinfo (in asyncio master branch) checks if the address is an IP address string, and if it is, it does not use threadpools or system getaddrinfo at all.
oh, so it does. that's great! because on 3.5 it doesn't. :(
That'll be fixed once 3.5.2 is out.
There's http://man7.org/linux/man-pages/man3/getaddrinfo_a.3.html but libuv doesn't like it: https://github.com/joyent/libuv/issues/617
getaddrinfo_a() just spawns a thread to do the DNS resolution.
the most ideal approach would be to allow installing a new set of DNS functions, allowing for a true async resolver to be used, i.e. loop.set_resolver() or similar, as discussed earlier in this issue.