creepjs icon indicating copy to clipboard operation
creepjs copied to clipboard

How to detect selenium

Open Zen33 opened this issue 2 years ago • 27 comments

`from selenium import webdriver from selenium.webdriver.chrome.options import Options

chrome_options = Options() chrome_options.add_argument('--disable-blink-features=AutomationControlled') chrome = webdriver.Chrome(executable_path='./chromedriver.exe', chrome_options=chrome_options) chrome.get('https://abrahamjuliot.github.io/creepjs')`

Invoke Chrome via the selenium package in Python, seemingly without being intercepted by creepjs, any suggestions? Thanks.

Zen33 avatar May 27 '22 04:05 Zen33

This is on my mind. I will look into this and see what I can find.

abrahamjuliot avatar Jun 06 '22 15:06 abrahamjuliot

Thanks for your reply, I found that the project named botD could detect this case, but I guess that might depended on server-side analytics.

Zen33 avatar Jun 07 '22 01:06 Zen33

Nice. There might be some new tricks at botD. Here are some resources by Antoine Vastel:

  • Selenium signals: https://github.com/antoinevastel/fp-collect/blob/dev/src/fpCollect.js#L351
  • Test page: https://antoinevastel.com/bots/
  • Detecting modified Selenium Chrome driver: https://datadome.co/bot-management-protection/tracking-modified-selenium-chromedriver/

abrahamjuliot avatar Jun 07 '22 14:06 abrahamjuliot

I had tried fp-collect months ago. Since this project was not maintained any more (2019), the command line above: chrome_options.add_argument('--disable-blink-features=AutomationControlled') cannot be detected through https://antoinevastel.com/bots/

Zen33 avatar Jun 08 '22 08:06 Zen33

I finally got around to testing this more in depth, and we do detect Selenium in headless. Even with Web Driver and the User Agent hidden, there are many headless signals available.

Detection of non-headless Selenium is missed, but I think that it is an unnecessary detection. Automated patterns can be detected through event listeners, but that's not a focus yet. I might create a test page for that.

Similarly, Puppeteer and Playwright can run Google Chrome in non-headless and use automation without being detected. I think all that is fine, as long as the web traffic is producing good activity and okay fingerprints.

This is the script I used.

import time
from selenium import webdriver


options = webdriver.ChromeOptions()
options.add_argument('--no-sandbox')
options.add_argument('--disable-blink-features=AutomationControlled') # web driver off
options.headless = True
options.add_argument("--window-size=800,600")
# make sure you download the driver that supports the chrome.exe
options.binary_location = "C:\Program Files\Google\Chrome Beta\Application\chrome.exe"
driver = webdriver.Chrome(options=options)


def save_screenshot(driver: webdriver.Chrome, path: str = 'selen_screenshot.png') -> None:
  # Ref: https://stackoverflow.com/a/52572919/
  original_size = driver.get_window_size()
  required_width = driver.execute_script('return document.body.parentNode.scrollWidth')
  required_height = driver.execute_script('return document.body.parentNode.scrollHeight')
  driver.set_window_size(required_width, required_height)
  # driver.save_screenshot(path)  # has scrollbar
  driver.find_element_by_tag_name('body').screenshot(path)  # avoids scrollbar
  driver.set_window_size(original_size['width'], original_size['height'])

try:
  driver.get('https://abrahamjuliot.github.io/creepjs/')
  time.sleep(10)
  save_screenshot(driver)
  input("press any key to exit...")
finally:
  driver.quit()


abrahamjuliot avatar Oct 14 '22 05:10 abrahamjuliot

image

abrahamjuliot avatar Oct 14 '22 05:10 abrahamjuliot

Good job! I'll take a look the latest version of creepjs for the rest of this month, thanks.

Zen33 avatar Oct 17 '22 12:10 Zen33

just fyi

  • https://news.ycombinator.com/item?id=34857087
  • https://antoinevastel.com/bot%20detection/2023/02/19/new-headless-chrome.html

Thorin-Oakenpants avatar Feb 19 '23 19:02 Thorin-Oakenpants

Nice. Just started researching this.

abrahamjuliot avatar Feb 20 '23 00:02 abrahamjuliot

Thanks.

Zen33 avatar Feb 23 '23 05:02 Zen33

@abrahamjuliot for selenium, and all chromedriver-driven browsers, check the two following values:

navigator.webdriver ==> remote debugging enabled resource

  • [ ] enabled?
  • [ ] lied?
    • [ ] spoofed with undetected-chromedriver script ? :
Object.defineProperty(window, 'navigator', {
    value: new Proxy(navigator, {
         has: (target, key) => (key === 'webdriver' ? false : key in target),
         get: (target, key) =>
              key === 'webdriver' ?
              false :
              typeof target[key] === 'function' ?
              target[key].bind(target) :
              target[key]
         })
    });                

cdc_adoQpoasnfa76pfcZLmcfl_Xxxxxxx

  • gets added to every new page with the following script:
(function () {
    window.cdc_adoQpoasnfa76pfcZLmcfl_Array = window.Array;
    window.cdc_adoQpoasnfa76pfcZLmcfl_Object = window.Object;
    window.cdc_adoQpoasnfa76pfcZLmcfl_Promise = window.Promise;
    window.cdc_adoQpoasnfa76pfcZLmcfl_Proxy = window.Proxy;
    window.cdc_adoQpoasnfa76pfcZLmcfl_Symbol = window.Symbol;
}) ();
  • the "random" strings seem to be hardcoded, but maybe directly using regex as following:
let objectToInspect = window,
    result = [];
while(objectToInspect !== null)
    { result = result.concat(Object.getOwnPropertyNames(objectToInspect));
    objectToInspect = Object.getPrototypeOf(objectToInspect); }
return result.filter(i => i.match(/.+_.+_(Array|Promise|Symbol)/ig))
  • [ ] exist ?
  • [ ] lied?
    • [ ] spoofed with old (Version<=V3.2) undetected-chromedriver script ? :
let objectToInspect = window,
    result = [];
    while(objectToInspect !== null)
        { result = result.concat(Object.getOwnPropertyNames(objectToInspect));
        objectToInspect = Object.getPrototypeOf(objectToInspect); }
    result.forEach(p => p.match(/.+_.+_(Array|Promise|Symbol)/ig)
        &&delete window[p]&&console.log('removed',p))

kaliiiiiiiiii avatar Mar 18 '23 14:03 kaliiiiiiiiii

Nice. These have been on my mind. There's also a way to get the cdc_... properties from the descriptors and bypass any random names added. I might add this at some point (a general detection).

undetected-chromedriver detection is excellent, but too specific for public concepts. The devs can fix the code and the detection becomes obsolete.

Good tips.

abrahamjuliot avatar Mar 18 '23 15:03 abrahamjuliot

Some more additional flags for detecting selenium and selenium adjacent softwares:

window["__nightmare"]
window["cdc_adoQpoasnfa76pfcZLmcfl_Array"]
window["cdc_adoQpoasnfa76pfcZLmcfl_Promise"]
window["cdc_adoQpoasnfa76pfcZLmcfl_Symbol"]
window["OSMJIF"]
window["_Selenium_IDE_Recorder"]
window["__$webdriverAsyncExecutor"]
window["__driver_evaluate"]
window["__driver_unwrapped"]
window["__fxdriver_evaluate"]
window["__fxdriver_unwrapped"]
window["__lastWatirAlert"]
window["__lastWatirConfirm"]
window["__lastWatirPrompt"]
window["__phantomas"]
window["__selenium_evaluate"]
window["__selenium_unwrapped"]
window["__webdriverFuncgeb"]
window["__webdriver__chr"]
window["__webdriver_evaluate"]
window["__webdriver_script_fn"]
window["__webdriver_script_func"]
window["__webdriver_script_function"]
window["__webdriver_unwrapped"]
window["awesomium"]
window["callSelenium"]
window["calledPhantom"]
window["calledSelenium"]
window["domAutomationController"]
window["watinExpressionError"]
window["watinExpressionResult"]
window["spynner_additional_js_loaded"]
document["$chrome_asyncScriptInfo"]
window["fmget_targets"]
window["geb"]

vxuv avatar Mar 18 '23 16:03 vxuv

Also worthwhile to check the types of navigator.plugins to ensure that it hasn't been tampered with.

vxuv avatar Mar 18 '23 16:03 vxuv

@vxuv @abrahamjuliot

Other relevant values can be found here: https://github.com/HMaker/HMaker.github.io/blob/master/selenium-detector/chromedriver.js

It detects objects created//used by chromedriver.

kaliiiiiiiiii avatar Apr 11 '23 18:04 kaliiiiiiiiii

@abrahamjuliot for selenium, and all chromedriver-driven browsers, check the two following values:

navigator.webdriver ==> remote debugging enabled resource

  • [ ] enabled?

  • [ ] lied?

    • [ ] spoofed with undetected-chromedriver script ? :
Object.defineProperty(window, 'navigator', {
    value: new Proxy(navigator, {
         has: (target, key) => (key === 'webdriver' ? false : key in target),
         get: (target, key) =>
              key === 'webdriver' ?
              false :
              typeof target[key] === 'function' ?
              target[key].bind(target) :
              target[key]
         })
    });                

cdc_adoQpoasnfa76pfcZLmcfl_Xxxxxxx

  • gets added to every new page with the following script:
(function () {
    window.cdc_adoQpoasnfa76pfcZLmcfl_Array = window.Array;
    window.cdc_adoQpoasnfa76pfcZLmcfl_Object = window.Object;
    window.cdc_adoQpoasnfa76pfcZLmcfl_Promise = window.Promise;
    window.cdc_adoQpoasnfa76pfcZLmcfl_Proxy = window.Proxy;
    window.cdc_adoQpoasnfa76pfcZLmcfl_Symbol = window.Symbol;
}) ();
  • the "random" strings seem to be hardcoded, but maybe directly using regex as following:
let objectToInspect = window,
    result = [];
while(objectToInspect !== null)
    { result = result.concat(Object.getOwnPropertyNames(objectToInspect));
    objectToInspect = Object.getPrototypeOf(objectToInspect); }
return result.filter(i => i.match(/.+_.+_(Array|Promise|Symbol)/ig))
  • [ ] exist ?

  • [ ] lied?

    • [ ] spoofed with old (Version<=V3.2) undetected-chromedriver script ? :
let objectToInspect = window,
    result = [];
    while(objectToInspect !== null)
        { result = result.concat(Object.getOwnPropertyNames(objectToInspect));
        objectToInspect = Object.getPrototypeOf(objectToInspect); }
    result.forEach(p => p.match(/.+_.+_(Array|Promise|Symbol)/ig)
        &&delete window[p]&&console.log('removed',p))

I was trying to hide the "webdriver=true" navigator attribute, and asked chatgpt. Its answer is spookily similar to yours.


const originalNavigator = navigator;

const proxyNavigator = new Proxy(originalNavigator, {
  get(target, prop) {
    if (prop === 'webdriver') {
      return false;
    }
    return target[prop];
  },
  ownKeys(target) {
    const keys = Reflect.ownKeys(target);
    return keys.filter((key) => key !== 'webdriver');
  },
  getOwnPropertyDescriptor(target, prop) {
    if (prop === 'webdriver') {
      return undefined;
    }
    return Reflect.getOwnPropertyDescriptor(target, prop);
  },
});

// Replace the global navigator object with the proxy object
Object.defineProperty(window, 'navigator', {
  value: proxyNavigator,
  configurable: false,
  enumerable: false,
  writable: false,
});

I'm hoping against hope, but any way to see if an object is a Proxy?

I feel like chrome is teasing me in the console.

image

JWally avatar May 05 '23 18:05 JWally

That is funny. I asked Bing Chat (gpt4) about our detecting JS Proxies and what it thought about our methods here. It didn't like our code and insisted we try outdated techniques on stack overflow.

abrahamjuliot avatar May 05 '23 18:05 abrahamjuliot

That's pretty solid btw!!!!

Reflect.setPrototypeOf(navigator, Object.create(navigator)) seems to be an approach for differentiating a proxied navigator object and the real deal.

Returns true if its proxied; false if its not.

Honest question - why doesn't webdriver = true set the bot-score at 100?

JWally avatar May 05 '23 19:05 JWally

The bot score has some game elements and includes tags like friend and stranger. By default, everyone is treated as a bot. From there, we just want to establish some level of trust. The more transparent and normal the player, the less they are perceived as untrustworthy. This allows use of web driver and headless UAs since these are designed for transparency.

abrahamjuliot avatar May 05 '23 19:05 abrahamjuliot

I'm sorry, I meant to say the 'headlessRating'.

Instead of just having 20% weight if true (I think its 1 in 5 attributes), it feels like it should automatically trip the value to 100% when its true.

I could be wrong, but I doubt you'd get too many false positives where "normal" users have webdriver set to true - seems like a really strong signal when present.

Just a thought as I'm going through your library trying to distil heuristics I can steal :-) Outstanding work btw!

JWally avatar May 05 '23 21:05 JWally

Ah yes, that's a good idea. I might change that at some point. Attempted overrides of navigator.webdriver can maybe additionally give weight to the stealth rating.

abrahamjuliot avatar May 05 '23 22:05 abrahamjuliot

BTW, I need to remove these from headless rating and move to like headless. These can appear in Android WebView, Smart TVs and other Chromium flavors.

noChrome
hasPermissionsBug

abrahamjuliot avatar May 05 '23 22:05 abrahamjuliot

That's a really good point. I totally forgot about the plethora of platforms that can legitimately ping a service, but look "weird".

I don't know if its worth the investment, but it might be interesting to have somewhere an attribute called "framework" or "automated" or something.

I'm sure some clever AI could parse out good rules, but things like you're using Windows, Chromium, and Webdriver is true? Minimum 99.8% chance you're automated. Its not good or bad, but its definitely not normal and it'd be useful to flag.

I could be here for weeks MMQB'ing this thing into the ground; dorking out over what-if's and things I think would be useful :-P

Thanks again for maintaining this thing!!

JWally avatar May 06 '23 09:05 JWally

Ah yes, that's a good idea. I might change that at some point. Attempted overrides of navigator.webdriver can maybe additionally give weight to the stealth rating.

It's possible to override the types for these object if I remember correctly. Probably still nice as an extra measure.

vxuv avatar May 06 '23 13:05 vxuv

I'm sorry, I meant to say the 'headlessRating'.

Instead of just having 20% weight if true (I think its 1 in 5 attributes), it feels like it should automatically trip the value to 100% when its true.

I could be wrong, but I doubt you'd get too many false positives where "normal" users have webdriver set to true - seems like a really strong signal when present.

Just a thought as I'm going through your library trying to distil heuristics I can steal :-) Outstanding work btw!

Headless=new new cannot detect

NCLnclNCL avatar Jun 29 '23 16:06 NCLnclNCL

@NCLnclNCL you're 100% right.

Right now, I'm keeping a Bayesian score, and if the browser is chromium and the OS isn't Linux; that's a big red flag its a bot. Not 100%, but definitely worth paying attention to. /shrug

JWally avatar Jun 30 '23 15:06 JWally

@NCLnclNCL you're 100% right.

Right now, I'm keeping a Bayesian score, and if the browser is chromium and the OS isn't Linux; that's a big red flag its a bot. Not 100%, but definitely worth paying attention to. /shrug

i think very hard to detect headless=new bro, it can slow than old headless but it is perfect to antidetect

NCLnclNCL avatar Jun 30 '23 15:06 NCLnclNCL