greek-adblockplus-filter
greek-adblockplus-filter copied to clipboard
List spring cleaning
Hello,
I was taking a brief look through the filter list. Considering that this is a project ongoing for more than 10 years, it might be worth performing some sort of spring cleaning, to identify and remove:
- domains no longer in operation
- site-specific rules blocking items that no longer exist on the site
- check for duplicates/redundant stuff
I can't think of a good methodology for this, other than manually checking.
Any thoughts and ideas? Is this even necessary? I was thinking that reducing the rules might help with performance of adblockers? IDK.
good idea! even if it's not strictly necessary, it's always good to clean up things every now and then
- domains not in operation should be easy to spot. just parse the domain name out of a rule and see if it resolves. if it does, keep it. if it doesn't remove it.
- + 3. far more difficult, especially the element hiding rules
for #92, I got (most of) the domains out of the rules file with:
cat void-gr-filters.txt| grep -Po '.*?(//|\|\||@@\|\||@@|\~)\K.*?(?=/|#)' | sort | uniq > domain-list.txt
cat void-gr-filters.txt| grep -Po '^[0-9a-zA-Z].*?(?=/|#)' | sort | uniq >> domain-list.txt
sort domain-list.txt | uniq > domain-list-final.txt
and then ran this:
#!/bin/bash
DOMAIN_LIST="domain-list-final.txt"
#DOMAIN_LIST="testme.txt"
RESOLVER="1.1.1.1"
BAD_DOMAINS="bad_domains"
SUB_NO_RECORD="no_record"
WWW_EXISTS="www_exists"
rm -f "${BAD_DOMAINS}" "${SUB_NO_RECORD}" "${WWW_EXISTS}"
while read -r line; do
# cleanups
myline=$(echo "${line}" | awk -F':' '{ print $1 }')
line=$(echo "${myline}" | grep -Ev '/|\|' | grep -Ev '^[0-9]')
if [ "x${line}" = "x" ]; then
continue
fi
echo "Working on: ${line}"
# Check if the subdomain exists
if [ "$(dig "${line}" @${RESOLVER} +short)" = "" ]; then
# Check if the subdomain with www prepended exists
if [ "$(dig "www.${line}" @${RESOLVER} +short)" = "" ]; then
domain=$(echo "${line}" | awk -F. '{ print $(NF-1) "." $NF }')
# if the domain doesn't have NS records, the domain does not exist any more
if [ "$(dig NS "${domain}" @${RESOLVER} +short)" = "" ]; then
echo "${domain}" | tee -a "${BAD_DOMAINS}"
# if the entry is a subdomain we already know it doesn't have A record
elif [ "$(echo "${line}" | grep -o '\.' | wc -l)" -gt "1" ]; then
echo "${line}" | tee -a "${SUB_NO_RECORD}"
fi
else
echo "${line}" | tee -a "${WWW_EXISTS}"
fi
fi
done < "${DOMAIN_LIST}"
double checked all "bad_domains" manually
@kargig Good stuff!
redundant stuff
Cosmetic filter have network start characters: https://github.com/kargig/greek-adblockplus-filter/commit/044bc9ff72118bd2b585606d21ae8afcc9251226 (made in 2016)
https://github.com/kargig/greek-adblockplus-filter/blob/72bccd07ccfc3b469fec2a47b1f2aec073c79277/void-gr-filters.txt#L432-L433
AdGuard disabled use in 2018: https://github.com/AdguardTeam/FiltersRegistry/commit/a452d4dcefdecaf4710f8056e11ecacd23fc73e1#diff-6472c0fcd53f81660278097de5b81a5a1cd70c38b8a5068d02039207a61d5726R93-R95