SafariConverterLib icon indicating copy to clipboard operation
SafariConverterLib copied to clipboard

Duplicates are not removed from JSON files of content blockers

Open Alex-302 opened this issue 2 months ago • 3 comments
trafficstars

Since other filters may include parts of the main filters, or independently have rules that already exist in the main filters, the CB size may be larger than necessary. Because of this, necessary rules may be discarded in order to fit within the acceptable CB JSON size.

Actual result

Converted JSON contains duplicates of rules. For example, when enabled Base Filter + Easylist, JSON contains 25k duplicates.

Total rules: 102 603 Unique rules: 76 668 Duplicates: 25 935

Expected result

JSON does not contain duplicates.

Duplicate example:

Details
    {
        "trigger": {
            "url-filter": ".*",
            "unless-domain": [
                "*memo.wiki",
                "*addchannel.net",
                "*beasoku.com",
                "*blog.housinkai.com",
                "*kakenhi.net",
                "*seesaa.net"
            ]
        },
        "action": {
            "type": "css-display-none",
            "selector": ".interstitial-ad"
        }
    },

Current General CB JSON. cb_general.zip

Proposed solution

Before compilation, filters of the same content blocker should be merged into one file, and cleaned of duplicates (taking into account domain lists).

Alex-302 avatar Sep 01 '25 14:09 Alex-302