crawlee icon indicating copy to clipboard operation
crawlee copied to clipboard

Integrate adblocker functionality

Open jakubbalada opened this issue 4 years ago • 9 comments

Interesting tip from HN (for Dashblock): Maybe you already do it, but I think integrating adblocker functionality when loading JS sites would be desirable to reduce load time. And if ads are what the API user is interested in, perhaps add a flag for whether or not one wants ads to load. Recommendation: https://github.com/cliqz-oss/adblocker Should be the fastest adblocker library (used by Ghostery, Cliqz and Brave)

jakubbalada avatar Sep 18 '19 19:09 jakubbalada

This could be integrated into Apify.launchPuppeteer() function as useAdBlock: true option.

https://sdk.apify.com/docs/api/apify#module_Apify.launchPuppeteer

mtrunkat avatar Sep 26 '19 08:09 mtrunkat

Greetings. So the thing would be to implement ad blocker to increase the speed of the scrap/crawl? I could work on this 🙏

Darking360 avatar Oct 03 '19 21:10 Darking360

Yes exactly, it could boost the speed especially for some websites that are heavy on ads (news sites). But it would be great to first test this assumption. Would you be interested also in trying this out? Use Apify SDK to run scraper with and without ad blocker against some websites?

mtrunkat avatar Oct 04 '19 09:10 mtrunkat

Sure! I can set up a test and run it to check this first with some timing debug, I'll create it and run it, then attach it here for you to see, thank you 🚀

Darking360 avatar Oct 04 '19 14:10 Darking360

interesting. I manually block all the common ad networks using blockRequests, this would offload the task to the extension

pocesar avatar Oct 08 '19 16:10 pocesar

Makes sense for a lot of users I guess but fyi it's an explicit anti-feature with usecase-killing effect for me. I'd need this off with zero sideeffects on current behavior.

deleted-user-1 avatar Jul 22 '20 23:07 deleted-user-1

Makes sense for a lot of users I guess but fyi it's an explicit anti-feature with usecase-killing effect for me. I'd need this off with zero sideeffects on current behavior.

In the small POC I proposed a while ago https://github.com/apify/apify-js/pull/600, the feature is completely disabled by default and only does some work when blocking is enabled by the user.

remusao avatar Jul 23 '20 08:07 remusao

Yeah, sorry @remusao . We still have not figured out if the performance will improve or not. I apologize.

mnmkng avatar Jul 23 '20 09:07 mnmkng

Yeah, sorry @remusao . We still have not figured out if the performance will improve or not. I apologize.

Of course, no worries at all, I just wanted to make clear to @matjaeck that there should be a way to integrate such a feature without any overhead when it's disabled.

remusao avatar Jul 23 '20 10:07 remusao