puppeteer-extra
puppeteer-extra copied to clipboard
(Umbrella) Anti-fingerprinting features
The current goal of the stealth
plugin is to hide headless browser usage by mocking/spoofing missing functionality in headless to emulate its headful counterpart.
Another nice feature (either as part of stealth
or a dedicated plugin) would be to add anti-fingerprinting measures, which could mean that we emulate the most common data, mock certain things in more detail or shuffle data on each request (or triggered by user) to make fingerprinting harder.
Things to look into:
- [ ] Canvas/Webgl/Audio hashes (hardware dependent)
- [ ] Keyboard layout (#98)
- [ ] WebGL renderer (#97)
- [ ] More extensive permission emulation (#100)
We need to keep in mind the basic issue about anti-fingerprinting:
CanvasBlocker actually increases your track-ability because the consistent factor is now that you have a changing canvas fingerprint (which almost no one does). This is why Safari tries to give a universal canvas fingerprint so you can "blend in" with other users.
(https://news.ycombinator.com/item?id=20054831)
Look into this FF addon for inspiration:
https://addons.mozilla.org/en-US/firefox/addon/canvasblocker/
The different block modes are:
- fake: Canvas Blocker's default setting, and my favorite! All websites not on the white list or black list can use the protected APIs. But values obtained by the APIs are altered so that a consistent fingerprinting is not possible
- ask for permission: If a website is not listed on the white list or black list, the user will be asked if the website should be allowed to use the protected APIs each time they are called.
- block everything: Ignore all lists and block the protected APIs on all websites.
- allow only white list: Only websites in the white list are allowed to use the protected APIs.
- block only black list: Block the protected APIs only for websites on the black list.
- allow everything: Ignore all lists and allow the protected APIs on all websites.
Protected "fingerprinting" APIs:
canvas 2d
webGL
audio
history
window (disabled by default)
DOMRect
navigator (disabled by default)
Panopticlick's numbers are extremely confusing and borderline useless.
On my initial run, I got an overall entropy of 17.63. My two biggest identifiers were screen resolution (1000x595x24 which was approx 1/22000 browsers) and webgl hash (approx 1/3800 browsers). I fixed screen resolution to 1000x600x24 (approx 1/85 browsers) and disabled webgl hashing (approx 1/6 browsers) and the overall entropy did not change one iota, despite also closing browser, flushing cache and cookies, etc. I gave it another run with a deliberately weird resolution (1420x701 which was something like 1/105000 browsers) and once again, the overall entropy was exactly 17.63. So based on my experiment, it seems that screen resolution and webgl hash have no effect whatsoever on [Panopticlick's] overall entropy score.
An update on last night's experiment, if anyone cares. The next largest identifier was system fonts (approx 1/1300 browsers). I set
browser.display.use_document_fonts=0
which hid the system fonts (now the same as approx 1/10 browsers) and my overall entropy dropped to just below 11 bits. At this point, none of the metrics were less common than 1/10 browsers, so I figured I wouldn't be able to do better than that.
As a side note, I ended up re-enabling system fonts because disabling them broke a large percentage of web sites' CSS.
Would love to see this feature. I believe recaptcha v3 is somehow factoring in browser fingerprint when calculating your overall score and this could potentially mitigate that.
We need random fingerprint plugin :)
I don't mind to work on such plugin, but i would need help as there a lot of things that would require to be randomised for it to work more efficiently.
I volunteer to help you @brunogaspar ! Instead of trying to make a "common" fingerprint, I think it would be a lot easier to make it possible for each browser instance to have a unique fingerprint (by adding different fonts, etc. )
@brunogaspar , @StevenVeshkini , I am happy to help out if needed. Do you guys have an idea of what data needs to be randomised, and how?
@brunogaspar , @StevenVeshkini , I am happy to help out if needed. Do you guys have an idea of what data needs to be randomised, and how?
I think the user agent and webgl vendor and renderer should be selected from a list of random up-to-date values. Currently there is only a single value which may be flagged as suspicious by being the default.
It could be nice to separate this into two steps:
- A more generic browser "persona" or fingerprint data generator, similar in spirit to e.g. faker.js
- The job of this library is to generate new convincing browser data/fingerprints
- Ideally with coherent data, e.g. webgl vendor matching the platform
- There are lists with most commonly used viewports/user-agent as an initial data source
- An even better data source would be to write a small fingerprint.js utility which will sniff realistic/full fingerprints from a website (can be hosted by a supporter with a bit of traffic)
- A plugin for puppeteer-extra which will apply this generated data
- Could potentially be used to seed the stealth plugin as well, otherwise the defaults will be used
- Devs need some control about when to refresh a fingerprint and which properties to skip (e.g. if the generated locale doesn't fit the proxy geo)
This is one of the most commonly used things being developed by companies using puppeteer-extra
, often with outdated or hardcoded lists of user-agents and mixed with non matching other fingerprint data.
It'd be worthwhile creating a more quality and re-usable plugin here. This would also tie in neatly with a future proxy-manager (luminati/oxylabs) plugin (another thing I've seen being built in-house countless times). ;-)
Hey @berstend
There are lists with most commonly used viewports/user-agent as an initial data source
I have a primitive version of this for a project that I am working on right now. I could easily abstract it into a separate project, and we could move to use it over here.
In order to create anti-fingerprinting, you need to understand what strategy they are using for fingerprinting, which nowadays are several, two of them are:
- https://blog.kasada.io/mastery-of-the-puppets-advanced-bot-detection/
- https://fingerprintjs.com/demo
Any update on the anti-fingerprint feature?
Wouldn't something like this already do the job? https://github.com/apify/fingerprint-injector
https://github.com/kaliiiiiiiiii/Selenium-Driverless/issues/207#issuecomment-2064328677 might be relevant//helpfull regarding the implementation for Keyboard layout