check-if-email-exists icon indicating copy to clipboard operation
check-if-email-exists copied to clipboard

Bulk Validation & IP blacklist

Open marcelinhov2 opened this issue 4 years ago • 34 comments

Hello @amaurymartiny, we are running your solution using Kubernetes but I'm getting problems with blacklist IPs.

I check that so many solutions have the bulk validation feature and I was think how can it be done without this blacklist problem. Do you have any tips of it?

I thought that could be proxies, but I don't know if it works and make sense.

Thanks.

marcelinhov2 avatar Mar 11 '20 00:03 marcelinhov2

I check that so many solutions have the bulk validation feature and I was think how can it be done without this blacklist problem

Most of the solutions out there already have a database of emails and their deliverability, so their bulk validation feature just hits the database for most of the emails you want to check.

To use this tool without having the blacklist problem, the only solution I can think of is IP rotation. So yeah, proxies as you said, but a large amount of them with IP rotation.

I thought I could get around with it with AWS's Lambda (each serverless function has a different IP address), but empirically it doesn't work so well: I tried on https://reacherhq.github.io/, the success rate is not very high.

I'll leave this issue open, if anyone else has ideas, I would like to hear.

amaury1093 avatar Mar 11 '20 08:03 amaury1093

Thanks for your answer @amaurymartiny. Do you know how I implement a proxy rotation in front of your services? I'm already using this ip rotation with my crawlers solutions but idk how to use it with your service, it is ready for this?

Thanks.

marcelinhov2 avatar Mar 11 '20 11:03 marcelinhov2

If not, do you know a safe range of requests per minute (maybe per hour, idk) that I can make without having my ip blocked?

Thanks again.

marcelinhov2 avatar Mar 11 '20 11:03 marcelinhov2

About your lambda solution, you only have high volume of IPs if you make parallel requests. For example, If you hit your lambda 50 times at the same it gonna up 50 threads for you. If you make more 50 requests after that, it gonna use the same 50 ips that was used before.

To make what you want work, you need to make batches until the ips going to the blacklist. When it happens, you need to wait 15 minutes (for the lambda die) and start again.

If you have a serverless version of your solution I can run some tests to give you feedbacks about it.

Once again, thanks for your help.

marcelinhov2 avatar Mar 11 '20 11:03 marcelinhov2

Thanks for researching into this!

Do you know how I implement a proxy rotation in front of your services?

I haven't looked into this myself. But I think it shouldn't be different from IP rotation in front of other services (e.g. your web crawler).

If not, do you know a safe range of requests per minute (maybe per hour, idk) that I can make without having my ip blocked?

This depends on the email provider you're testing the email against. MS Outlook blocked me after 3 email validations, Gmail seems a bit more permissive but I haven't tested deeply.

If you have a serverless version of your solution I can run some tests to give you feedbacks about it.

Here's how I set up AWS lambda: https://github.com/reacherhq/microservices.

I currently don't have much time myself to look into IP rotation, so if you find something, I would gladly appreciate some reporting back here 🙏 !

amaury1093 avatar Mar 11 '20 16:03 amaury1093

The serverless version is giving me a timeout. Do you know if I need to configure something?

Thanks.

marcelinhov2 avatar Mar 11 '20 17:03 marcelinhov2

image

Even localhost :(

marcelinhov2 avatar Mar 11 '20 17:03 marcelinhov2

it's often normal to get timeouts on localhost, because your ISP block requests on port 25.

When I said above "but empirically it doesn't work so well", that's what I meant on AWS Lambda. I do get some of the requests that pass. So I guess it's something related to their infrastructure: some serverless functions get port 25 blocked, others don't.

amaury1093 avatar Mar 11 '20 19:03 amaury1093

And what about build a layer for proxy rotation?

marcelinhov2 avatar Mar 22 '20 17:03 marcelinhov2

That would be a sweet idea!

However, it's out of scope for this tool. I'm okay to add a --proxy flag to the binary, so that all the SMTP requests/responses go through that proxy first. But I personally will not build the IP rotation proxy itself.

There might already be some other tools available for this, if you find something I'd really like to know!

amaury1093 avatar Mar 22 '20 18:03 amaury1093

Hey @amaurymartiny, how are you going with all this Covid-19 thing? I hope you are doing well...

We keep going with our tests here and we found a new problem that maybe you can help us: image

We are getting this Helo command rejected in some domains that we are testing emails. I found this 2 links, idk if it can help in any way:

https://unix.stackexchange.com/questions/91749/helo-command-rejected-need-fully-qualified-hostname-error

https://forums.zimbra.org/viewtopic.php?t=18646

Do you think that it is a problem that we can handle?

Thanks.

marcelinhov2 avatar Mar 25 '20 23:03 marcelinhov2

connecting to a SMTP to validate.. this requires that the IP you're connecting from (to the SMTP server) has

  1. open SMTP server on port 25
  2. is NOT blacklisted via spamhaus

is this correct?

If so, wouldn't it be easier to just set up VPS (with open port 25) with smtp server, and round robbin those servers when you're trying to verify different variations of email?

taewookim avatar Mar 26 '20 03:03 taewookim

Hey @taewookim,

We already have port 25 opened at our side but it's kind of impossible to not be blacklisted when trying to validate a batch of emails. I'm still trying to understand how can I do this like zerobounce and thechecker.co does.

To be honest I didn't try the VPS approach yet, but for sure I will.

Thanks again.

marcelinhov2 avatar Mar 26 '20 13:03 marcelinhov2

@marcelinhov2 Thanks, I'm all good. I hope you are safe & healthy too.

We are getting this Helo command rejected in some domains that we are testing emails. I found this 2 links, idk if it can help in any way:

I just published 0.7.0 on Docker and on the Releases page. The binary takes a --hello-name, and the HTTP server takes a hello_name field in the JSON input. This field is used in the EHLO smtp command. Put something that is a FQDN, and your error should go away.

Note: I just did some quick testing, and published this 10min ago, so there might be bugs (hope on though). I'll do some more thorough testing on my side.

amaury1093 avatar Mar 26 '20 13:03 amaury1093

Great @amaurymartiny.

We are going to test this today and I give you a feedback.

Thank you so much

marcelinhov2 avatar Mar 26 '20 14:03 marcelinhov2

Worked perfectly

marcelinhov2 avatar Mar 28 '20 20:03 marcelinhov2

Maybe a combination with this one https://github.com/mattes/rotating-proxy will do the trick for IP blacklist bypass

nikos90 avatar Nov 02 '20 08:11 nikos90

@nikos90 The problem with Tor is that a lot of SMTP servers block Tor exit nodes. Even if you rotate IPs within Tor, they will still get blacklisted at the exit.

amaury1093 avatar Nov 24 '20 20:11 amaury1093

that's right. most proxies / tor exit nodes are already blacklisted. dont bother.

@amaurymartiny

I've been running distributed servers on low end VPSs for this kinda thing. It's a pain in the arse to maintain but might be a possibility to create a service that takes care of this type of stuff for someone interested in distributed IPs for checking tons of emails. Let me know if you wanna collab.

taewookim avatar Dec 02 '20 18:12 taewookim

I've a subscription of a hosting provider that allow me to send unlimited email and they have multiple smtps with different IPs and good reputation so can I use this script with that external smtp? I guess that will fix the issue for me

zoid007 avatar Dec 13 '20 09:12 zoid007

You would need to proxy the requests through the external smtp, see the --proxy-* flags on the binary.

BTW, would you mind sharing which hosting provider you use that have good reputation SMTP servers?

amaury1093 avatar Dec 13 '20 11:12 amaury1093

Lambda doesn't solve the issue

Lambda doesn't have a public IP address (it's using NAT)

Each AWS account deploys lambda containers in a group of dedicated EC2 instances.

So basically all your lambdas are running in a few EC2 which all connect to the same NAT instance

See more at https://stackoverflow.com/a/37793338/634577

Lusitaniae avatar Jul 16 '21 07:07 Lusitaniae

Sorry to revive this dead issue... just want to make sure I'm understanding something right.

To be able to proxy, every single proxy server would have to have port 25 open, correct? Does anyone have any clue how one would go about this? I've had difficulty finding any proxies that have port 25 open, let alone that aren't already blocked.

I had bought a VPS but I'm guessing the IP has been used by someone else before. Got just a couple verifications in before Spamhaus blacklisted it.

arimgibson avatar Aug 22 '22 17:08 arimgibson

I've a subscription of a hosting provider that allow me to send unlimited email and they have multiple smtps with different IPs and good reputation so can I use this script with that external smtp? I guess that will fix the issue for me

Hi @zoid007 could you share the hosting provider you used?

bahout avatar Sep 09 '22 13:09 bahout

In my case, I need to ask to AWS support team.

marcelinhov2 avatar Sep 09 '22 14:09 marcelinhov2

@arimgibson

I had bought a VPS but I'm guessing the IP has been used by someone else before. Got just a couple verifications in before Spamhaus blacklisted it.

This is because the data you are checking are spam traps.

JoshuaAGE avatar Dec 11 '22 18:12 JoshuaAGE

@arimgibson

I had bought a VPS but I'm guessing the IP has been used by someone else before. Got just a couple verifications in before Spamhaus blacklisted it.

This is because the data you are checking are spam traps.

@JoshuaAGE I'd be surprised; it's from an email list collected through website sign ups from a forum-type site. I have user first and last names as well. Not just a list I downloaded/bought

arimgibson avatar Dec 12 '22 18:12 arimgibson

@arimgibson The reason why you get listed at Spamhaus - probably CSS and not SBL - is 100% based on your data. Especially forum signups draw spam trap signups and Spamhaus spam trap feed providers buy a lot of old domains and convert them into spam traps.

JoshuaAGE avatar Dec 13 '22 01:12 JoshuaAGE

That makes sense and sounds right @JoshuaAGE; appreciate the input! Unfortunate because that makes my life a lot harder haha. A good number of the emails are from smaller email providers or individual/small company's domains. I suppose that's what I get for using data from literal decades ago :stuck_out_tongue_winking_eye:

arimgibson avatar Dec 31 '22 20:12 arimgibson

@arimgibson Just filter them out... not easy, I know.

JoshuaAGE avatar Jan 13 '23 18:01 JoshuaAGE