PyFunceble
PyFunceble copied to clipboard
FEATURE: Implement check the availability of website / detection of parked & for sale domains
Description
I have been thinking about it even before I've found this: https://github.com/StevenBlack/hosts/issues/1613#issuecomment-820162550 @AdKiller
As many as dead (non-existent domain etc) domains, there are many parked / for sale zombie domains, which still do exist but are dead at the same time...: "This domain is for sale", "Website is no longer available", currenlty PyFunceble marks such domain as ACTIVE...but PyFunceble could have an option to check whether a real website exists on the domain and mark it "REAL" if not, then "ZOMBIE" label.
Possible Solution
Such feature:
- would require downloading a domain's main body and searching in the body for text phrases like: "for sale", "no longer available", "rent domain" etc,
- could lead to some false positives
- a pity there is no status code for it: https://en.wikipedia.org/wiki/List_of_HTTP_status_codes, would not require downloading the whole site's body and searching for text phrases
Screenshoot
The trick is to setup your own DNS recursor and then import (Setup, configure) it with the RPZ zone pirated.mypdns.cloud
The reason is you then will be blocking through the .rpz-nsdname The next thing you have to do is to disable the http status code and whois check and purely relay on the DNS test.
You can now do 2 things with the results file domain/INACTIVE/list
- Add them as pirtared domain and block them
- use a removal tool like
sed
and remove them from your source
This is a much safer approach than than trying to keep a up to date set of rules, it will also add the ip to the bot lists.
Give it a spin, there are some getting started config for Power dns Recursor here: https://mypdns.org/rpz/dns-rpz-integration/-/tree/master/PowerDNS-Recursor
If you having trouble getting it to work, please do open a issue or a discussion
The idea is good @keczuppp.
My problem right now is that the same webpage as in the screenshot gives me the text in German (IP-based). Let's keep this open until I find a way to implement it - somehow ...
spirillen : https://github.com/funilrys/PyFunceble/issues/255#issuecomment-933797906
I don't know much about these things, I would have to study all that stuff first
funilrys : https://github.com/funilrys/PyFunceble/issues/255#issuecomment-941221455: text in German (IP-based).
- in this case it's based not purely on IP, but on language of the browser in the first place, by one of the methods, have not checked so far, whether the website will fallback to IP-based localize detection when not using a browser to get a body, but yeah, if the website will fall back, then it complicates searching for text phrases, there is just too many languages, but the phrase search method will work on many other sites, which don't provide localized messages, so we should not resign from phrase search method
- also another problem is that in this case, the example text phrase "for sale", regardless of the site's language, is not even present in raw source code of the page, but is generated from javascript file, when loading and parsing by the browser, so searching for "for sale" text phrase in the source code won't work in this case (returns 0 results), but on many other parked sites will still work of course, so we should not resign from text search method
- we shoud search body for some other unique text identifiers / links as well, as most of parked domains have some unique links to the main parking server, example: https://publicwww.com/websites/parking.bodiscdn.com/ , going trough the results, all are parked zombies, no functional websites, almost 10 000...there could be created a list of such parking-links, if a parking-link is present in the body's source, that means this is a parked / for sale domain
Another idea, from jawz101 : https://github.com/easylist/easylist/issues/2374#issuecomment-946087387 :
OP here- I just want to say this is very impressive work.
@ ryanbr @ felix-22
I have a suggestion and I may propose it to @ funilrys for the PyFunceble utility. I think I have done so before. For the past couple of years I've been using the Cisco Umbrella (formerly OpenDNS) Top 1 Million daily DNS lookup reports they publish here to evaluate the adaway list.
If you are unfamiliar with OpenDNS, it is a public DNS service which has been around for longer than most other public resolvers which allowed for content filtering and malware/phishing protection. They make lists of the top 1 million name lookups records publicly available. "The OpenDNS Global Network processes an estimated 100 billion DNS queries daily from 85 million users through 25 data centers worldwide."
So, regardless of if a domain is registered- these are the actual queries made by us in circulation. If a domain is parked, it's going to be valid but nothing is pointing to it so you'll never see it used. I will download, say, the past 3 months of logs and if no one has tried to lookup a domain, I pull it from the adaway list.
edit- crazy. I just did it with 2 days top 1million files and of the 6,999/25,556 were on the top 1 million lists.
@keczuppp wrote in https://github.com/funilrys/PyFunceble/issues/255#issuecomment-946580250
Another idea, from jawz101 : easylist/easylist#2374 (comment) :
Touching https://github.com/funilrys/PyFunceble/issues/128
@keczuppp wrote in https://github.com/funilrys/PyFunceble/issues/255#issuecomment-941369292
I don't know much about these things, I would have to study all that stuff first
You should do that as it will enhance your system performances significantly: you should read Performance test of Hosts file vs DNS-Recursors :wink:
@keczuppp a early version is available in the parked-subject
branch. However, I'm not sure if it is necessary to create 2 new status: REAL
and ZOMBIE
(or similar) ... With that commit, the tested will be subject will be treated as INACTIVE
.
Does that fit everyone's needs? If it does, I will proceed with merging the branch to the dev
version of PyFunceble.
Asking for inputs: @spirillen @mitchellkrogza @ZeroDot1 and others
@funilrys
A few thought :thought_balloon:
- For determine pirated domains you should be following https://mypdns.org/infrastructure/dante-commit-bot/-/issues/15, this is where we are building the safe list for marking domains as parked/hijacked/sharked
- Why?: a higher number of these "parked" domains is used for phishing
- You might consider including our pirated project lists to enhance the positive hit lists
- (Haven't check the --help) But there should be a switch for (en|dis)-abling this feature and/or maybe even adding own source for known pirated domains (Should be a very trustworthy source as some of these domains actually do get sold and reactivated)
I might get back with more, when tested
Feature has been disabled because I need to gather more intel on how people will use this:
- As a new test (like syntax, availability, reputation) option.
- As a SPECIAL rule.
Personally I would say the --pirated
option is best as it allows the individual to chose for them self and yet allows them to use the --special-lookup
As more things is user optional, the better your modular approach is accomplished.
Pardon for not writing back sooner, but I have not been very active on github lately, I only noticed yesterday.
funilrys: a early version is available in the
parked-subject
branch.
Cool.
funilrys: However, I'm not sure if it is necessary to create 2 new status:
REAL
andZOMBIE
(or similar) ... With that commit, the tested will be subject will be treated asINACTIVE
.
I'm not sure either, I was merely speculating whether it could be useful or not, for the users to distinguish between reasons for which a domain is inactive, but I have no idea whether such a distinction is useful for the users from a practical point of view, so if it is not, in that case we can stick to marking an inactive domain simply inactive, regardless of the reason.
funilrys: Asking for inputs: ... and others
As for the phrases to look for ( LINK 1 ) they seem good, however it seems this one: .com is for sale
is worth including as well ( LINK 2 )
Also like I mentioned before ( LINK 3 ), we should search the raw body not only for typical word phrases, but also for other text values because:
- some parked domains don't provide natively embeded text messages in raw body about domain being for sale, but instead they keep them in JS scripts and generate during page loading, which is out of scope for the tool, because it's not an internet browser
- even worse, some parked domains don't provide at all any kind of text message about domain being parked
- a random example of such parked domain which can't be found by common phrases is:
av4.xyz
- this parked domain can be indentified as parked only by some of the other values like:- unique dom class element's name:
.comp-is-parked
or.sale_link
- unique JS script variable's name:
tcblock
- unique external JS script name:
"maincaf.js"
- unique link:
parkingcrew.net/assets
- unique domain name (SLD):
d38psrni17bvxu
- unique dom class element's name:
- I'm providing a table with millions of parked domains in PublicWWW that can be identified additionally or sometimes only by various values other than typical word phrases, it's worth considering adding these values to the search list:
Value of Identifier | Type of Identifier | Current ammount of domains (31.12.2022) | Previous ammount of domains (02.12.2021) | Change |
---|---|---|---|---|
tcblock | JS variable name | over 1 000 000 | over 1 000 000 | unknown |
js3caf | JS script partial name | over 1 000 000 | over 1 000 000 | unknown |
d1lxhc4jvstzrp | Domain name (SLD) | over 1 000 000 | over 1 000 000 | unknown |
d38psrni17bvxu | Domain name (SLD) | over 1 000 000 | - | unknown |
"maincaf.js" | JS Script name | 937 500 | - | unknown |
"for_sale_lander.css" | Stylesheet filename | 720 522 | over 1 000 000 | noticeable decerase |
"LANDER_SYSTEM" | JS script object name | over 1 000 000 | over 1 000 000 | unknown |
"img1.wsimg.com/parking-lander/static" | JS Script Src Chunk Link | over 1 000 000 | over 1 000 000 | unknown |
"img.sedoparking.com" | Image/Banner Link | over 1 000 000 | 937 337 | unknown incerase |
"i.cdnpark.com/themes/registrar/images/logo_namecheap.png" | Image/Banner Link | 511 955 | 285 793 | double incerase |
"traffic.club" | Domain name | 188 | 283 014 | died |
" |
DOM Title name | 213 682 | 198 156 | small incerase |
"framework.syrahost.com/dist/crazydomains/parked.css?" | Stylesheet filepath + filename | 292 426 | 165 689 | double incerase |
"cdn-staging.domainmarket.com/static-landers/assets/js/main.js" | JS script filepath + filename | 287 | 161 552 | died |
"domainparking.ru/privacy-policy" | Link to Privacy Policy | 124 678 | 133 541 | small decerase |
"brokerage.domainbrokers.se/?domain=" | Domain name + request | 6 567 | 83 703 | died |
"ewebdevelopment.com/quotes/inquire/" | Link | 80 912 | 80 013 | not changed |
"parking.bodiscdn.com" | Domain name | 795 081 | 55 314 | extreme incerase ( x14 ) |
"shop.ename.com" | Domain Shop name | 32 884 | 49 518 | noticeable decerase |
"d1s9zexeqsmc0t" | Domain name (SLD) | 582 | 34 169 | died |
"Start Domain For Sale Box" | Word Phrase | 612 | 27 986 | died |
"/porkbun.com/checkout/addCartItems?items[marketplace]=" | Domain name + request | 12 110 | 13 056 | small decerase |