hosts icon indicating copy to clipboard operation
hosts copied to clipboard

List only domains, not IP addresses

Open Strykar opened this issue 5 years ago β€’ 9 comments

Apologies if this has been discussed before, I looked at over a dozen related issues before opening this.

Any chance you could add an option/switch to updateHostsFile.py so that just the list hostnames (no localhost/ip6/else) are listed? This is useful for BSD pf tables that will do name resolution itself we simply block all the IPs (in that table) the names resolve to.

Strykar avatar Dec 26 '19 00:12 Strykar

Hello! Thank you for opening your first issue in this repo. It’s people like you who make these host files better!

welcome[bot] avatar Dec 26 '19 00:12 welcome[bot]

Hi Avinash @Strykar I've been thinking of this recently.

In addition to this, I was also thinking of an optional custom sort order, so domains cluster together, by domain, then tld, then first level subdomain, then second level subdomain...something like this:

a.com
a.a.com
a.a.a.com
a.net
a.a.net
a.a.a.net

This means removing comments, too.

StevenBlack avatar Dec 26 '19 04:12 StevenBlack

In addition to this, I was also thinking of an optional custom sort order, so domains cluster together, by domain, then tld, then first level subdomain, then second level subdomain...something like this: This means removing comments, too.

Absolutely, yes, thank you Steven! However, do we need to block a.a.com if we're blocking a.com? Are there instances where blocking a.com makes sense but not a.a.com?

Strykar avatar Dec 26 '19 06:12 Strykar

Domains and sub domains are distinct, in hosts files. So we need to block them explicitly. There are no wild-card mechanisms in hosts files. I wish there was.

StevenBlack avatar Dec 26 '19 07:12 StevenBlack

I totally missed this issue due to the holidays. @Strykar, check these lists out and see if that's what you're thinking of:

https://scripttiger.github.io/alts/

ScriptTiger avatar Jan 03 '20 14:01 ScriptTiger

Literally the title of this repository is hosts, with a subtitle of Unified hosts file with base extensions. What does AdGuard do for the hosts file? Needless to say, your comments are wildly out of scope, neither related to this repository nor the issue at hand.

@StevenBlack, I make a motion to delete all of @infinitewaveparticle's comments and block him from further access to shut down this shill as a spam vector.

ScriptTiger avatar Jan 31 '20 23:01 ScriptTiger

@ScriptTiger β€” Agreed.

StevenBlack avatar Feb 01 '20 19:02 StevenBlack

Apologies if this has been discussed before, I looked at over a dozen related issues before opening this.

Any chance you could add an option/switch to updateHostsFile.py so that just the list hostnames (no localhost/ip6/else) are listed? This is useful for BSD pf tables that will do name resolution itself we simply block all the IPs (in that table) the names resolve to.

@Strykar we are looking into this hosts file for another project and plan to make similar use to the one you proposed by just using awk(1) to extract the DNS name :-)

Of course, it is never that simple - I made a quick test so I thought I'd share.

pfctl(8) will exit if any of the hosts file entries does not resolve [1]. Maybe there's a pfctl(8) option to change this behaviour but cursory inspection of the man page did not yield much of use to this regard.

Considering all hosts files certainly contain (permanently or temporarily) entries yielding NXDOMAINs, the most likely workaround for now is to decouple the name resolution from (re)populating the pf tables.

A script will walk the hosts file resolving the A and AAAA records, handling CNAMEs/NXDOMAINs or multiple addresses per record and dumping the results to a file for pfctl(8) to atomically load. Conveniently, pfctl(8) also handles duplicate addresses.

[1]

# pfctl -t testing -T replace -f /tmp/tmp-hosts         
no IP address found for oralse.ca
pfctl: cannot load /tmp/tmp-hosts: Undefined error: 0

ioc32 avatar Feb 02 '20 13:02 ioc32

However, do we need to block a.a.com if we're blocking a.com? Are there instances where blocking a.com makes sense but not a.a.com?

If you're interested in reading a lengthy issue on this, we had a debate over this with @vixie here: https://github.com/StevenBlack/hosts/issues/451. Some argued that a domain with multiple untrusted subdomains should render the domain itself untrusted. An example used was 2o7.net, which has a large number of untrusted subdomains. However, some of its subdomains are actually required for some "trusted" websites to load properly, such as apple.com requiring appleglobal.112.2o7.net and applestoreus.112.2o7.net.

A direct quote from the man himself:

ty! RRPZ looks like it won't be nec'y and may overblock (there could be subdomains of the shared domain which are not malicious).

I will also note there are several issues and PRs @StevenBlack has actively closed in the past with no action taken due to lack of due diligence and simply enumerating all subdomains of a given domain, as this goes against the mission statement of this project to aggregate highly curated lists and keep the content as relevant as possible. There is a certain level of due diligence expected to have taken place for each distinct domain, subdomain, etc. This is why we pull from so many curators that focus on more narrow fields, in order to take advantage of more expert opinions and aggregate the sum of those opinions under a broader scope.

ScriptTiger avatar Feb 02 '20 18:02 ScriptTiger