puppeteer-extra icon indicating copy to clipboard operation
puppeteer-extra copied to clipboard

Cloudflare detecting pupeteer

Open joeledwardson opened this issue 2 years ago • 10 comments

I have not queried or clicked anything using puppeteer, simply connected to the browser seems enough for cloudflare to block access to a site.

I have used the simplest possible example in puppeteer with a real browser (no headless) and no automation scripts.

import puppeteer from 'puppeteer-extra'
import StealthPlugin from 'puppeteer-extra-plugin-stealth'
puppeteer.use(StealthPlugin())

;(async () => {
  console.log('launching...')
  const browser = await puppeteer.launch({
    executablePath: 'C:/Program Files/Google/Chrome/Application/chrome.exe',
    headless: false,
    defaultViewport: null
  })
  console.log('connected')
  const page = await browser.newPage()
  await page.goto('https://nowsecure.nl')
  console.log('waiting for 1 min...')
  await new Promise((r) => setTimeout(r, 60_000))
  console.log('closing...')
  await browser.close()
})()

I have replicated this without puppeteer and clicking on the cloudflare verification button I pass through to the website, which means I suspect that somehow they are able to detect Puppeteer?

The video below shows manual clicking but cloudflare refuses access:

https://github.com/berstend/puppeteer-extra/assets/25906558/973501b3-25e5-40a4-98ad-888315930b4b

I have also replicated this on android, forwarding the port to chrome dev tools via ADB and connected to the debugging port and experience the same result.

For mobile, I:

  • use ADB to forward the chrome dev tools port: adb forward tcp:9000 localabstract:chrome_devtools_remote
  • Run the following script to connect with puppeteer
import { Browser, connect } from 'puppeteer-core'

let browser: Browser | null = null

const timer = (ms: number) => new Promise<null>((res) => setTimeout(() => res(null), ms))

export async function puppeteerConnect({
  port,
  queryTimeoutMs
}: {
  port: string
  queryTimeoutMs: number
}): Promise<Browser> {
  const debuggerUrl = 'http://127.0.0.1:' + port + '/json/version'

  const fetcher = async () => {
    const result = await fetch(debuggerUrl)
    return await result.text()
  }

  const result = await Promise.race([timer(queryTimeoutMs), fetcher()])
  if (result === null) {
    throw new Error('get debugger URL timed out')
  }

  const data = JSON.parse(result) as { webSocketDebuggerUrl?: unknown }

  const wsUrl = data?.webSocketDebuggerUrl
  if (typeof wsUrl !== 'string') {
    throw new Error('get debugger url from response failed, `wsUrl` is not string')
  }

  // use socket url to connect to with puppeteer
  const browser = await Promise.race([
    connect({
      browserWSEndpoint: wsUrl,
      defaultViewport: null
    }),
    timer(queryTimeoutMs)
  ])
  if (browser === null) {
    throw new Error('puppeteer connect timed out')
  }
  return browser
}

async function retryConnect() {
  let lastErr: unknown = null
  let i = 0
  while (i < 20) {
    console.log('connection attempt #', i)
    try {
      return await puppeteerConnect({ port: '9000', queryTimeoutMs: 500 })
    } catch (err) {
      lastErr = err
    }
    await new Promise((r) => setTimeout(r, 1000))
    i += 1
  }
  throw lastErr
}

;(async () => {
  console.log('connecting...')
  const _browser = await retryConnect()
  console.log('connected!')
  browser = _browser
  const pages = await browser.pages()
  const firstPage = pages[0]
  if (!firstPage) {
    throw new Error('NO PAGE')
  }
  await firstPage.goto('https://nowsecure.nl')

  await new Promise((r) => setTimeout(r, 60_000))
})().finally(() => {
  console.log('browser disconnecting')
  browser?.disconnect()
  console.log('should be done?')
})

joeledwardson avatar Sep 27 '23 17:09 joeledwardson

Try using the start-up tab and see if it works. We have more info on this problem here: https://github.com/berstend/puppeteer-extra/issues/832

NodePuppeteer avatar Sep 28 '23 00:09 NodePuppeteer

I am now recently (within last two weeks) seeing the exact same thing. Using the start-up tab doesn't seem to make a difference.

krkeegan avatar Dec 12 '23 01:12 krkeegan

@krkeegan @joeledwardson @NodePuppeteer @peterblazejewicz @bclougherty #832

mdervisaygan avatar Dec 23 '23 12:12 mdervisaygan

I am now recently (within last two weeks) seeing the exact same thing. Using the start-up tab doesn't seem to make a difference.

I had luck up until now. Now, anything that is protected by Cloudflare, simply doesn't let me do anything... even if I solve captcha myself... it continues spinning, or reporting that I've failed to pass the test as human being.

Is there anyone that had luck resolving this issue?

bajgit98 avatar Jun 07 '24 15:06 bajgit98

I am now recently (within last two weeks) seeing the exact same thing. Using the start-up tab doesn't seem to make a difference.

I had luck up until now. Now, anything that is protected by Cloudflare, simply doesn't let me do anything... even if I solve captcha myself... it continues spinning, or reporting that I've failed to pass the test as human being.

Is there anyone that had luck resolving this issue?

https://medium.com/@zfcsoftware/how-to-bypass-cloudflare-with-node-js-869fa6e21dd5

mdervisaygan avatar Jun 07 '24 17:06 mdervisaygan

Friend, your article is absolutely wrong... You completely do not understand the cause of this issue. Please stop spamming these threads.

vladtreny avatar Jun 08 '24 08:06 vladtreny

Friend, your article is absolutely wrong... You completely do not understand the cause of this issue. Please stop spamming these threads.

The article is about passing Cloudflare. 2 pieces of code are given. Both can easily pass including the corporate plan. Which part is wrong? I am trying to convey a source because they constantly say that we cannot pass Cloudflare. Explain the wrong part and let's learn together. Also, I'm not spamming. My first message was to link a github discussion. It has nothing to do with me and there are dozens of people in that discussion. I am waiting for you to explain what is wrong.

mdervisaygan avatar Jun 08 '24 08:06 mdervisaygan

i had this issue, some website have more advanced scraper detection. The solution was to use a proxy residential service like brightdata, and pass the proxy args to pupeteer.

const BROWSER_CONFIG: PuppeteerLaunchOptions = {
  headless: 'new',
  defaultViewport: null,
  ignoreHTTPSErrors: true,
  args: ['--proxy-server=xxxx:xxxx'],
};

const browser = await puppeteer.launch(BROWSER_CONFIG);
const page = (await browser.pages())[0];

await page.authenticate({
  username: 'xxxxx',
  password: 'xxxxxx',
});

Kosmoon avatar Jun 08 '24 08:06 Kosmoon

zfcsoftware

bruh the method this blog introduced not work for me

monsterlady avatar Jul 27 '24 07:07 monsterlady

zfcsoftware

bruh the method this blog introduced not work for me

You can test puppeteer-real-browser with the latest version. You should not have any problems, it has just been updated. If you are using Linux, I recommend running it with Docker.

Windows Server Test: https://github.com/user-attachments/assets/b1c4dca1-db48-4692-ac67-fc399d11e009

Ubuntu 24 test: https://github.com/user-attachments/assets/b1040e6a-9d8d-4fed-910a-52cabbd82130

mdervisaygan avatar Jul 27 '24 08:07 mdervisaygan