uBlockOrigin-HUGE-AI-Blocklist icon indicating copy to clipboard operation
uBlockOrigin-HUGE-AI-Blocklist copied to clipboard

Some rules can block legitimate sites

Open jesse-tong opened this issue 11 months ago • 2 comments

Problem

Some blocklist rules, especially in the search engine result blocklist, can block legitimate sites including with the blanket block rules

Affected sites

Search engine blocklist: Every site having the blocked words in the search engine result having the blocked word(s)

Details:

After reading some of the uBlock rules, in the search engine block session, there are several blanket rules at the end of the blocklist that will block any site which has title having words in the blocklist, such as "Ai", or "Lora model", regardless of context. \

Also in the blocklist, some sites such as Medium or Artstation, even though they may contain content generated in large language models and image generation models (which are most people concerning about AI-related spams), many people such as those working in software engineering and IT-related fields (for Medium) and artists, illustrators, designers(for Artstation) still use these platforms, even though they are in the caution list, they and others are still in blanket rules and aren't commented yet. \

For example, if the user using the blocklist wants to search about AI-related laws or want to find resources about fine tuning a "AI" model for other purposes (like machine translation, text summarization, OCR,... ) and want to learn about LoRA; any result about them will be blocked due to the blanket rules. Also some of the subreddit blocks like r/machinelearning for example, are mostly about news and technical questions rather than actual spam, and they may cover other topics as well. And even limiting the keywords to just these sites are less likely to work (especially with Medium). \

Also I think we need to clarify that these blocklists will only work with sites that are FOUND to be full of spam created by LLMs and image generation models. And use of LLMs and image generation models such as for SEOs or posting in social medias and forum are likely not covered as they are likely trying to not presenting themselves as using these models and pass as legitimate sites. And due to how complex of the models used for AI-related spam (such as GPT-3.5 Turbo, GPT-4, Gemini, MidJourney, DALL-E,...) and how variant of those models' result, uBlock filters cannot block them and any model or algorithms to filter these posts need to be extremely complex.

jesse-tong avatar Mar 19 '24 09:03 jesse-tong

I'd also like to add that sites like Hugging Face are, too, included in the general block list. While they are AI-related, they are certainly not relegated exclusively to AI spam. I think they can be moved into the extra list.

walking-octopus avatar Apr 07 '24 10:04 walking-octopus

@jesse-tong I've just removed the blanket rules you were talking about in favor for user-specified rules (aka diy). Should someone want these blanket rules, they can just apply it themselves. Hopefully this is a good compromise!

laylavish avatar May 06 '24 05:05 laylavish