SafariConverterLib
SafariConverterLib copied to clipboard
Duplicates are not removed from JSON files of content blockers
trafficstars
Since other filters may include parts of the main filters, or independently have rules that already exist in the main filters, the CB size may be larger than necessary. Because of this, necessary rules may be discarded in order to fit within the acceptable CB JSON size.
Actual result
Converted JSON contains duplicates of rules. For example, when enabled Base Filter + Easylist, JSON contains 25k duplicates.
Total rules: 102 603 Unique rules: 76 668 Duplicates: 25 935
Expected result
JSON does not contain duplicates.
Duplicate example:
Details
{
"trigger": {
"url-filter": ".*",
"unless-domain": [
"*memo.wiki",
"*addchannel.net",
"*beasoku.com",
"*blog.housinkai.com",
"*kakenhi.net",
"*seesaa.net"
]
},
"action": {
"type": "css-display-none",
"selector": ".interstitial-ad"
}
},
Current General CB JSON. cb_general.zip
Proposed solution
Before compilation, filters of the same content blocker should be merged into one file, and cleaned of duplicates (taking into account domain lists).