undetected-chromedriver icon indicating copy to clipboard operation
undetected-chromedriver copied to clipboard

AWS EC2 Linux Server Detected Despite Manipulated Headers Using Undetected Chromedriver

Open Daves17 opened this issue 10 months ago • 9 comments

I am encountering an issue where my AWS EC2 Linux server is still being detected as such by web services, despite my efforts to mask its identity using Undetected Chromedriver. Despite setting custom User-Agent headers and other techniques to mimic a Windows client, the server is recognized as Linux, which impacts my testing and automation tasks.

Expected behavior: The web service should not be able to detect that the browser is being run from a Linux server on AWS EC2. It should identify it as a Windows client based on the manipulated headers.

Actual behavior: Despite the header manipulation, the server's Linux OS is detected. Inspection of the network requests reveals that some headers, particularly sec-ch-ua-platform, still explicitly mention Linux, which might be contributing to the detection:

"headers": {
    "Referer": "https://22bets.me/",
    "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/123.0.0.0 Safari/537.36",
    "sec-ch-ua": "\"Google Chrome\";v=\"123\", \"Not:A-Brand\";v=\"8\", \"Chromium\";v=\"123\"",
    "sec-ch-ua-mobile": "?0",
    "sec-ch-ua-platform": "\"Linux\""
}

Questions:

  • Are there known limitations or issues with header manipulation in Undetected Chromedriver that could cause this behavior on an AWS EC2 instance?
  • Any suggestions on how to effectively mask the Linux server's identity against sophisticated browser fingerprinting techniques?

Daves17 avatar Apr 21 '24 18:04 Daves17

Updating just the user-agent isn't the only way to detect the actual platform of the browser.

Consider utilising 'sec' overrides.

sec_ch_ua = '"Examplary Browser"; v="73", ";Not?A.Brand"; v="27"' # for example

options = Options()
options.add_argument(f'--sec-ch-ua={sec_ch_ua}')

b-nnett avatar Apr 22 '24 02:04 b-nnett

In a datacenter (what AWS is), you are detected per definition

ultrafunkamsterdam avatar Apr 22 '24 14:04 ultrafunkamsterdam

Thank you for your answers! I am still wondering what possibilities there are for the server not to be detected. I already use proxies.

Daves17 avatar Apr 22 '24 17:04 Daves17

Who is detecting you? They may detect your IP. They be doing TLS fingerprint which is happening more and more

"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/123.0.0.0 Safari/537.36",

if this is your UA, then of course they are going to detect you, you are telling them you are using headless chrome

bluemangofunk avatar Apr 23 '24 18:04 bluemangofunk

In general, you are right. But what surprises me is that I'm not detected with my computer, even though I use the same configs and proxies. My computer for comparison:

"request": {
            "headers": {
                "Upgrade-Insecure-Requests": "1",
                "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/123.0.0.0 Safari/537.36",
                "sec-ch-ua": "\"Google Chrome\";v=\"123\", \"Not:A-Brand\";v=\"8\", \"Chromium\";v=\"123\"",
                "sec-ch-ua-mobile": "?0",
                "sec-ch-ua-platform": "\"Windows\""
            }

My assumption is therefore that detecting Linux is the problem.

Daves17 avatar Apr 24 '24 12:04 Daves17

There's a thousand things they'd be flagging you for just Linux.

As @ usr741852 said, there's probably some level of fingerprinting at play, and if you're using a more typical base EC2 machine, that'd be the easiest to encounter blocks on.

Easy to test though, just spin up an instance with Windows on.

b-nnett avatar Apr 24 '24 12:04 b-nnett

Sounds like a good idea. I'll let you know as soon as I have the results

Daves17 avatar Apr 24 '24 13:04 Daves17

I have now run the bot on an AWS Windows server. Unfortunately, I had the same experience, although I started it once with and once without headless. It probably has nothing to do with the operating system, but with the fact that these are EC2 instances. Is there a way to bypass this?

Daves17 avatar Apr 24 '24 19:04 Daves17

To better understand the problem, here is some context: The bot scrapes the data of the particular matches. When I start the bot with my computer, it finds them all. If I start it with the server, the website returns a limited offer of all matches. So, I'm not blocked but only limited because the bot is detected.

Daves17 avatar Apr 24 '24 20:04 Daves17