python-email-validator icon indicating copy to clipboard operation
python-email-validator copied to clipboard

Async deliverability checking?

Open magnuswatn opened this issue 2 years ago • 15 comments

Hi,

Thank you for creating this excellent library.

Would you accept a PR that adds async methods for deliverability checking? A quick look suggest it would entail a new validate_email_deliverability function, with some duplicated logic, and a new validate_email function which could probably share almost all logic with the existing one.

Thanks.

magnuswatn avatar Apr 13 '23 12:04 magnuswatn

I actually started working on that a while ago in https://github.com/JoshData/python-email-validator/tree/async. But my standards are higher now for completing the work: There has to be a complete set of tests and the code has to be clear and documented. And if the work is started over, I also really really want to avoid duplicated logic by not having separate functions. So yes but with those caveats.

JoshData avatar Apr 13 '23 12:04 JoshData

I just realized the branch didn't actually have my async work on it. I've fixed it now and tried to bring it up to date with other changes that I did since I started working on it. It's not in a working state though.

JoshData avatar Apr 16 '23 00:04 JoshData

I will share with you my async implementation; feel free to use it or get inspired by it.

The method first performs one DNS request for MX records, optimistically assuming it won't be a Null MX. If it happens to be a Null MX, it will perform two DNS requests in parallel for A/AAAA records.

In contrast to python-email-validator, it doesn't check for SPF records. In my opinion, such assumptions are incorrect.

# LICENSE: CC0-1.0 (Public Domain)

import logging
from operator import attrgetter
from anyio import create_task_group
from dns.asyncresolver import Resolver
from dns.exception import DNSException, Timeout
from dns.rdatatype import RdataType
from dns.resolver import NXDOMAIN, NoAnswer, NoNameservers
from email_validator import validate_email 

resolver = Resolver()

...

info = validate_email(email, check_deliverability=False)
domain = info.ascii_domain
success = False

async with create_task_group() as tg:

    async def task(rd: RdataType):
        nonlocal success

        try:
            answer = await resolver.resolve(domain, rd)
            rrset = answer.rrset
        except NoAnswer:
            rrset = None
        except NXDOMAIN:
            return  # domain does not exist, skip further checks
        except (NoNameservers, Timeout):
            raise  # something's wrong on our side
        except DNSException:
            # some other error, log and proceed gracefully
            logging.exception('DNS error for %r (%r)', domain, rd)
            rrset = None

        if rd == RdataType.MX:
            if not rrset:
                # on implicit mx, try a/aaaa
                tg.start_soon(task, RdataType.A)
                tg.start_soon(task, RdataType.AAAA)
                return

            # mx - treat not-null answer as success
            # sort answers by preference in descending order
            rrset_by_preference = sorted(rrset, key=attrgetter('preference'), reverse=True)
            exchange = str(rrset_by_preference[0].exchange)
            success = exchange != '.'
        else:
            # a/aaaa - treat any answer as success and cancel other tasks
            if rrset:
                success = True
                tg.cancel_scope.cancel()

    tg.start_soon(task, RdataType.MX)

Zaczero avatar Mar 04 '24 23:03 Zaczero

Thanks for sharing! The branch currently has an async implementation that seems to be working. It doesn't run DNS queries in parallel though. I'd be curious to see if it improves performance in real world scenarios. I might try it although I don't know when I'll have time to.

JoshData avatar Mar 05 '24 01:03 JoshData

Bump!

mrdeveloperdude avatar Mar 11 '24 20:03 mrdeveloperdude

I'd appreciate anyone testing out the async branch before I merge it.

JoshData avatar Mar 11 '24 22:03 JoshData

I have taken a look at the code and the only thing that stands out is that this async implementation only supports asyncio and not trio. I know that there are many people who prefer to use trio and libraries should generally be async platform agnostic (but it's your decision at the end of the day). anyio is a nice package that lets you support both at once (although I am not sure if it will work with this Future use case).

There is also a small chance that asyncio Future will work out of the box with trio - I haven't tested the code, I just read it.

But maybe the future dependency is not needed at all? Maybe just return an object and let the _async method handle both cases and only await if needed.

Aside of that, looks good 🙂

Zaczero avatar Mar 12 '24 06:03 Zaczero

Thanks for the feedback! Makes sense. I'll take a look.

JoshData avatar Mar 25 '24 11:03 JoshData

FWIW you might be able to use collections.abc.Awaitable instead of asyncio.Future.

tamird avatar May 09 '24 09:05 tamird

Oh interesting.

I need to make time to make some test scripts and try some of the other frameworks. Probably won't happen soon.

JoshData avatar May 10 '24 02:05 JoshData