ferrum
ferrum copied to clipboard
Implement stealth mode
https://github.com/berstend/puppeteer-extra/tree/master/packages/puppeteer-extra-plugin-stealth
Would be great if it could pass those tests
undetected_chromedriver might also be a good reference.
Also it would probably make sense to add the intoli's checks to the specs. They are also on GitHub (here and here).
@route Any thoughts on adding this in? We've been using ferrum for a while now and started getting blocked on one of the sites.
I'm happy to take a cut at implementing this if you want to outline some of your thoughts on how you envision doing it. I studied the source code for about an hour tonight just thinking through some options here.
Hi @brettallred,
I'm happy to take a cut at implementing
This would be so wonderful! :pray:
I'm not a maintainer here but I would like to see Stealth mode as an integrated extension.
My idea would be:
Specs
- the specs could get a new directory for extensions (i.e.
spec/extensions/stealth) - For the specs itself it would probably make sense to add a static page (see
spec/support/viewsfor some examples) that shows various states (could be visually simpler than this, since we only would need to check the text output in the specs). There are nice reference pages out there with checks that could be integrated in this page:
Implementation of the extension itself
there are good references out there:
-
these modules from
puppeteer-extra-plugin-stealth(IMHO the most complete implementation with a lot of details) — there's also a minified version of it available. So maybe we could have araketask that simply fetches that JS from CDN or usesextract-stealth-evasionsitself to make an own build. This way it would be very easy to update the script (also we could profit from patches on the other project). It seems that callingnpx extract-stealth-evasionsshould be enough on a machine that has node installed? -
intoli did not only show how to detect but also how to circumvent these checks — check the sources mentioned above
-
Python's
undetected-chromedriveris simple (but by far not enough yet for many cases!) -
especially for CloudFlare: they show less Captchas if the Privacy Pass extension is used (see this Cloudflare post for more information on that. Maybe it should be documented how to integrate and setup it easily? This could also be another blog post. Or maybe even integrated as another extension?
Outside of the specs, you could also check the reCAPTCHA score how good the scripts work.
Summary of a possible solution — TL;DR;
- Create a HTML file in
spec/support/viewscontaining the checks mentioned above to have a reliable check available within the specs — maybe also a simple HTML table with a summary (i.e.you are [not] a bot) - Write the spec in the way that it intentionally fails (since the extension is not used / ready yet — so that it's obvious that the specs work — i.e.
expect(browser.body).to include("you are not a bot")) - Write a rake task (i.e.
rake update:stealth_extension) to fetch/build the minimized/compiledpuppeteer-extra-plugin-stealthextension and put it in a niceextensionsdirectory within the ferrum repository - Hopefully the spec will be green now if the extension was properly loaded (remember to add
Ferrum::Browser.new(extensions: %w(path/to/stealth/ext.js)) or even a shortcut likestealth_mode: trueto that) :wink: - optional: document how to integrate Privacy Pass
Again, this is just an idea and I'm not the maintainer here. So please take it with a grain of salt. But I think this could work in a very maintainable manner.
PS: Updating the stealth extension could even be a GitHub action later on.
I just wanted to pass a small note that the move @alexanderadam proposed is absolutely feasible. Absurdly so. I've always been a bit intimidated wrangling the js/extension side of things so I kind of brushed that last comment off a bit, assuming additional wiring would need to happen. Tonight I stumbled back into it and noted in particular extract-stealth-evasions, and thought I'd just see where I could get with it. Woah.
First off, thank you @alexanderadam for your detailed note. I saw it this spring, but like I said... I didn't understand it's proposed simplicity. Second, I wanted to report these findings just in case it inspires someone else.
According to these webpages :
- https://piprogramming.org/articles/How-to-make-Selenium-undetectable-and-stealth--7-Ways-to-hide-your-Bot-Automation-from-Detection-0000000017.html
Tests of bot.sannysoft.com and www.nowsecure.nl are successfully passed with this configuration of browser :
browser = Ferrum::Browser.new(browser_path: BROWSER_PATH, headless: false, browser_options: { "disable-blink-features": "AutomationControlled" })
I don't yet find how to pass them in headless mode.
Isn't this a problem better solved at the Chromium level?
I read this article recently, seems like there are improvements in an upcoming version of Chrome:
https://antoinevastel.com/bot%20detection/2023/02/19/new-headless-chrome.html
I'd close this issue, out of scope for Ferrum.
It is, but still ferrum itself can provide some guidance and scripts to make it even harder from the beginning to detect automation.
Is there documentation on how to get the new headless mode in Ferrum?
You've found a solution on how to transfer them in headless mode?
You can enable the new headless mode in chromium by modifying the browser options:
Ferrum::Browser.new(browser_options: { "headless": "new" })
You can enable the new headless mode in chromium by modifying the browser options:
Ferrum::Browser.new(browser_options: { "headless": "new" })
it doesn't work, because there's a lot more work to be done https://github.com/rubycdp/ferrum/pull/379