python-email-validator
python-email-validator copied to clipboard
Async deliverability checking?
Hi,
Thank you for creating this excellent library.
Would you accept a PR that adds async methods for deliverability checking? A quick look suggest it would entail a new validate_email_deliverability function, with some duplicated logic, and a new validate_email function which could probably share almost all logic with the existing one.
Thanks.
I actually started working on that a while ago in https://github.com/JoshData/python-email-validator/tree/async. But my standards are higher now for completing the work: There has to be a complete set of tests and the code has to be clear and documented. And if the work is started over, I also really really want to avoid duplicated logic by not having separate functions. So yes but with those caveats.
I just realized the branch didn't actually have my async work on it. I've fixed it now and tried to bring it up to date with other changes that I did since I started working on it. It's not in a working state though.
I will share with you my async implementation; feel free to use it or get inspired by it.
The method first performs one DNS request for MX records, optimistically assuming it won't be a Null MX. If it happens to be a Null MX, it will perform two DNS requests in parallel for A/AAAA records.
In contrast to python-email-validator, it doesn't check for SPF records. In my opinion, such assumptions are incorrect.
# LICENSE: CC0-1.0 (Public Domain)
import logging
from operator import attrgetter
from anyio import create_task_group
from dns.asyncresolver import Resolver
from dns.exception import DNSException, Timeout
from dns.rdatatype import RdataType
from dns.resolver import NXDOMAIN, NoAnswer, NoNameservers
from email_validator import validate_email
resolver = Resolver()
...
info = validate_email(email, check_deliverability=False)
domain = info.ascii_domain
success = False
async with create_task_group() as tg:
async def task(rd: RdataType):
nonlocal success
try:
answer = await resolver.resolve(domain, rd)
rrset = answer.rrset
except NoAnswer:
rrset = None
except NXDOMAIN:
return # domain does not exist, skip further checks
except (NoNameservers, Timeout):
raise # something's wrong on our side
except DNSException:
# some other error, log and proceed gracefully
logging.exception('DNS error for %r (%r)', domain, rd)
rrset = None
if rd == RdataType.MX:
if not rrset:
# on implicit mx, try a/aaaa
tg.start_soon(task, RdataType.A)
tg.start_soon(task, RdataType.AAAA)
return
# mx - treat not-null answer as success
# sort answers by preference in descending order
rrset_by_preference = sorted(rrset, key=attrgetter('preference'), reverse=True)
exchange = str(rrset_by_preference[0].exchange)
success = exchange != '.'
else:
# a/aaaa - treat any answer as success and cancel other tasks
if rrset:
success = True
tg.cancel_scope.cancel()
tg.start_soon(task, RdataType.MX)
Thanks for sharing! The branch currently has an async implementation that seems to be working. It doesn't run DNS queries in parallel though. I'd be curious to see if it improves performance in real world scenarios. I might try it although I don't know when I'll have time to.
Bump!
I'd appreciate anyone testing out the async branch before I merge it.
I have taken a look at the code and the only thing that stands out is that this async implementation only supports asyncio and not trio. I know that there are many people who prefer to use trio and libraries should generally be async platform agnostic (but it's your decision at the end of the day). anyio is a nice package that lets you support both at once (although I am not sure if it will work with this Future use case).
There is also a small chance that asyncio Future will work out of the box with trio - I haven't tested the code, I just read it.
But maybe the future dependency is not needed at all? Maybe just return an object and let the _async method handle both cases and only await if needed.
Aside of that, looks good 🙂
Thanks for the feedback! Makes sense. I'll take a look.
FWIW you might be able to use collections.abc.Awaitable instead of asyncio.Future.
Oh interesting.
I need to make time to make some test scripts and try some of the other frameworks. Probably won't happen soon.