puppeteer-extra icon indicating copy to clipboard operation
puppeteer-extra copied to clipboard

(Umbrella) Anti-fingerprinting features

Open berstend opened this issue 4 years ago • 16 comments

The current goal of the stealth plugin is to hide headless browser usage by mocking/spoofing missing functionality in headless to emulate its headful counterpart.

Another nice feature (either as part of stealth or a dedicated plugin) would be to add anti-fingerprinting measures, which could mean that we emulate the most common data, mock certain things in more detail or shuffle data on each request (or triggered by user) to make fingerprinting harder.

Things to look into:

  • [ ] Canvas/Webgl/Audio hashes (hardware dependent)
  • [ ] Keyboard layout (#98)
  • [ ] WebGL renderer (#97)
  • [ ] More extensive permission emulation (#100)

berstend avatar Dec 10 '19 14:12 berstend

We need to keep in mind the basic issue about anti-fingerprinting:

CanvasBlocker actually increases your track-ability because the consistent factor is now that you have a changing canvas fingerprint (which almost no one does). This is why Safari tries to give a universal canvas fingerprint so you can "blend in" with other users.

(https://news.ycombinator.com/item?id=20054831)

berstend avatar Dec 10 '19 15:12 berstend

Look into this FF addon for inspiration:

https://addons.mozilla.org/en-US/firefox/addon/canvasblocker/

The different block modes are:

  • fake: Canvas Blocker's default setting, and my favorite! All websites not on the white list or black list can use the protected APIs. But values obtained by the APIs are altered so that a consistent fingerprinting is not possible
  • ask for permission: If a website is not listed on the white list or black list, the user will be asked if the website should be allowed to use the protected APIs each time they are called.
  • block everything: Ignore all lists and block the protected APIs on all websites.
  • allow only white list: Only websites in the white list are allowed to use the protected APIs.
  • block only black list: Block the protected APIs only for websites on the black list.
  • allow everything: Ignore all lists and allow the protected APIs on all websites.

Protected "fingerprinting" APIs:

canvas 2d
webGL
audio
history
window (disabled by default)
DOMRect
navigator (disabled by default)

berstend avatar Dec 10 '19 15:12 berstend

Panopticlick's numbers are extremely confusing and borderline useless.

On my initial run, I got an overall entropy of 17.63. My two biggest identifiers were screen resolution (1000x595x24 which was approx 1/22000 browsers) and webgl hash (approx 1/3800 browsers). I fixed screen resolution to 1000x600x24 (approx 1/85 browsers) and disabled webgl hashing (approx 1/6 browsers) and the overall entropy did not change one iota, despite also closing browser, flushing cache and cookies, etc. I gave it another run with a deliberately weird resolution (1420x701 which was something like 1/105000 browsers) and once again, the overall entropy was exactly 17.63. So based on my experiment, it seems that screen resolution and webgl hash have no effect whatsoever on [Panopticlick's] overall entropy score.

An update on last night's experiment, if anyone cares. The next largest identifier was system fonts (approx 1/1300 browsers). I set browser.display.use_document_fonts=0 which hid the system fonts (now the same as approx 1/10 browsers) and my overall entropy dropped to just below 11 bits. At this point, none of the metrics were less common than 1/10 browsers, so I figured I wouldn't be able to do better than that.

As a side note, I ended up re-enabling system fonts because disabling them broke a large percentage of web sites' CSS.

berstend avatar Dec 10 '19 15:12 berstend

Would love to see this feature. I believe recaptcha v3 is somehow factoring in browser fingerprint when calculating your overall score and this could potentially mitigate that.

Vittitow avatar Jan 07 '20 03:01 Vittitow

We need random fingerprint plugin :)

yalexx avatar Mar 27 '20 08:03 yalexx

I don't mind to work on such plugin, but i would need help as there a lot of things that would require to be randomised for it to work more efficiently.

brunogaspar avatar Apr 12 '20 14:04 brunogaspar

I volunteer to help you @brunogaspar ! Instead of trying to make a "common" fingerprint, I think it would be a lot easier to make it possible for each browser instance to have a unique fingerprint (by adding different fonts, etc. )

StevenVeshkini avatar May 28 '20 08:05 StevenVeshkini

@brunogaspar , @StevenVeshkini , I am happy to help out if needed. Do you guys have an idea of what data needs to be randomised, and how?

itsdarrylnorris avatar Jul 01 '20 07:07 itsdarrylnorris

@brunogaspar , @StevenVeshkini , I am happy to help out if needed. Do you guys have an idea of what data needs to be randomised, and how?

I think the user agent and webgl vendor and renderer should be selected from a list of random up-to-date values. Currently there is only a single value which may be flagged as suspicious by being the default.

evading-bot-detection avatar Jul 15 '20 07:07 evading-bot-detection

It could be nice to separate this into two steps:

  • A more generic browser "persona" or fingerprint data generator, similar in spirit to e.g. faker.js
    • The job of this library is to generate new convincing browser data/fingerprints
    • Ideally with coherent data, e.g. webgl vendor matching the platform
    • There are lists with most commonly used viewports/user-agent as an initial data source
    • An even better data source would be to write a small fingerprint.js utility which will sniff realistic/full fingerprints from a website (can be hosted by a supporter with a bit of traffic)
  • A plugin for puppeteer-extra which will apply this generated data
    • Could potentially be used to seed the stealth plugin as well, otherwise the defaults will be used
    • Devs need some control about when to refresh a fingerprint and which properties to skip (e.g. if the generated locale doesn't fit the proxy geo)

This is one of the most commonly used things being developed by companies using puppeteer-extra, often with outdated or hardcoded lists of user-agents and mixed with non matching other fingerprint data.

It'd be worthwhile creating a more quality and re-usable plugin here. This would also tie in neatly with a future proxy-manager (luminati/oxylabs) plugin (another thing I've seen being built in-house countless times). ;-)

berstend avatar Jul 15 '20 13:07 berstend

Hey @berstend

There are lists with most commonly used viewports/user-agent as an initial data source

I have a primitive version of this for a project that I am working on right now. I could easily abstract it into a separate project, and we could move to use it over here.

itsdarrylnorris avatar Jul 15 '20 16:07 itsdarrylnorris

In order to create anti-fingerprinting, you need to understand what strategy they are using for fingerprinting, which nowadays are several, two of them are:

  • https://blog.kasada.io/mastery-of-the-puppets-advanced-bot-detection/
  • https://fingerprintjs.com/demo

andersonaguiar avatar Aug 26 '20 10:08 andersonaguiar

Any update on the anti-fingerprint feature?

Hypnos999 avatar Dec 16 '21 11:12 Hypnos999

Wouldn't something like this already do the job? https://github.com/apify/fingerprint-injector

JaneJeon avatar Jan 02 '22 19:01 JaneJeon

https://github.com/kaliiiiiiiiii/Selenium-Driverless/issues/207#issuecomment-2064328677 might be relevant//helpfull regarding the implementation for Keyboard layout

kaliiiiiiiiii avatar Apr 18 '24 17:04 kaliiiiiiiiii