unbound-adblock
unbound-adblock copied to clipboard
Generate ad-serving and malware list for unbound
=============================================== Script to generate Ad-block domains for unbound
Take a list of known malware and ad-serving domains and generate an amalgamated configuration file fragment for unbound_. This fragment when included in the main body of unbound.conf, will block these hosts and domains serving malware and/or intrusive ads.
Usage
You will need GNU Make (any recent version). And a recent golang toolchain
(>1.11). Assuming GNU Make is available as gmake
, type::
gmake
This will generate two config file fragments for unbound_:
- bad-hosts.conf: Config file fragment with a few trackers; the list of blocklist items are in myfeed.txt
- big.conf: Very large list of blocklist domains and hosts (~30MB, ~700k entries). The blocklist feed comes from bigfeed.txt (auto-generated).
Include one of the config files (bad-hosts.conf or big.conf) in your unbound.conf as follows::
# include auto-generated ad-block/malware list
include: /path/to/bad-hosts.conf
And reload unbound config to use the new blocklist.
Details
The blocklist is generated by a golang program in the blgen
directory. It is
built using the shell script build
. The output binary is put in a platform
specific directory (bin/$os-$arch/blgen
). Usage::
blgen [options] [blocklist ...]
Read one or more blocklist files and generate a composite file containing
blocked hosts and domains. The final output is written to STDOUT or to
an output file.
blgen can optionally read a feed (txt file) of well known 3rd party malware and
tracker URLs. The feed.txt is a simple file:
- Each line starts with either a 'txt' or 'json' followed by a URL.
- The keyword 'txt' or 'json' identifies the type of output returned by the URL
Example:
txt http://pgl.yoyo.org/files/adhosts/plaintext
txt http://mirror2.malwaredomains.com/files/justdomains
Options:
-c, --cache-dir D Use 'D' as the cache directory ["."]
-F, --feed F Read blocklists from feed file 'F' [""]
--no-cache Ignore the cache and re-fetch every blocklist [False]
-o, --output-file F Write output to file 'F' [""]
-f, --output-format T Set output format to 'T' (text or unbound) [""]
-v, --verbose Show verbose output [false]
-W, --allowlist F Add whistlist entries from file 'F' [[]]
The -W
flag can be used multiple times to add multiple allow list sources.
Caching
`blgen` caches the downloaded blocklists and only refreshes it once a day.
In the default invocation of `blgen` in *GNUmakefile*, the
cache-dir is the current directory. Each cache file uses the URL as the prefix
and a truncated SHA256 sum of the URL as the suffix. The cache can be ignored
via the `--no-cache` option.
.. _unbound: https://unbound.net/
Guide to source code
====================
The go program is organized as follows:
- *internal/blgen*: contains the implementation of the blocklist DB,
fetching host-lists etc.
- *blgen/*: contains the driver program ("main()") along with a few helper
routines to generate the output.
.. vim: ft=rst:sw=4:ts=4:expandtab:tw=78: