fakefilter icon indicating copy to clipboard operation
fakefilter copied to clipboard

Domain cleanup methodology

Open Daniel3356 opened this issue 2 years ago • 6 comments

Here is the list of dead emails that need to be removed: usbvap.com , hacktoy.com , outlook.sbs , yandex.cfd

Daniel3356 avatar Apr 10 '22 02:04 Daniel3356

Thanks here is our first methodology for this problem:

once a domain was discovered, we want to keep this domain in our list 365 days and drop out of the list until it has been seen again and would be in the list for another 365 days since firstseen. The firstseen property is already part of our json format...

We have been thinking to check MX and A records and take them out of the list but based on our (even) short-time experience: those providers are most of the time small and might lose their hosting, or even DNS provider or even intentionally remove domains from DNS, this is very ambiguous way of 'detection'

We want to aim for 'reliable' way; all domains stay in the list once seen for 365 days and drop. Imagine you have a domain x.com and add to the disposable service, the chance that you still owe is very high, on the other hand if you drop the domain and someone else takes it; we would have a false-positive in the list but the chance that new owner hosts a mail server with hundreds of emails is very tiny and this is a risk we can take. Comments and ideas are welcome about our methodology.

7c avatar Apr 10 '22 11:04 7c

Hey @7c,

Update of the unable domains:

mantutimaison.com
shhongshuhan.com
azwee.site
drhoangsita.com
snasu.info
yongshuhan.com
funplus.site
toanciamobile.com
zipzx.site
mamonsuka.com
mobitivaisao.com
usbvap.com
hansgu.com
filezw.site
mantutivi.com
ngocsita.com
hacktoy.com
phonestlebuka.com
zipea.site
bookel.site
hungtaoteile.com
omilk.site
devoi.site
prcea.site

Some of them have become premium domain such as:
usbvap.com sold for 114,986.58$CAD hacktoy.com sold for 63,881.44$CAD typery.com sold for 191,644.31$CAD 1ki.co sold for 879.01$CAD oanhxintv.com sold for 3,689.79$CAD king.buzz sold for 58,627.83$CAD

Daniel3356 avatar Jun 13 '22 14:06 Daniel3356

There must be some sort of way to use registrars to validate the domain still exists... Like whois does?

d3xt3r01 avatar Jun 29 '22 11:06 d3xt3r01

$ whois google.com -h whois.iana.org | grep whois: $ whois google.com -h whois.verisign-grs.com

I suppose the initial tld data can be cached ...

d3xt3r01 avatar Jun 29 '22 12:06 d3xt3r01

We still believe there is no need to drop them before 1 year. We know domain business, we are also capable of doing whois to verify but like i posted: "once a domain was discovered, we want to keep this domain in our list 365 days and drop out of the list until it has been seen again and would be in the list for another 365 days since firstseen. The firstseen property is already part of our json format..."

we do not need to worry about FREE domains, they are free, so invalid, so does not matter if they are detected as FAKE, which is kind of TRUE. If domain has changed ownership, which is 99.9% of cases not the case before 365 days, we will drop them anyways... If domain is still in ownership of that provider but changed their NS,MX,OWNER etc, they may use this method to obfuscate or hide domains from us...

If we keep a domain, regardless of any external data for 365 days and auto drop, i do not see any harm... quite the opposite... we will have solid detection and anti-obfuscation...

One scenario i can think of is: if a fake-email-provider adds domains they do not own. This would be a false-positive but we are filtering top 100k most visited websites domains from being added to mitigate this as much as possible.

7c avatar Jun 29 '22 13:06 7c

We have implemented the expiration code. We remove all domains they have not been seen last 365 days from our crawlers starting today.They will be removed from API/JSON/JSON_V2/MARKDOWN files and will only be visible EXPIRED DOMAINS SECTION

7c avatar Nov 30 '23 16:11 7c