puppeteer icon indicating copy to clipboard operation
puppeteer copied to clipboard

[Bug]: await browser.pages() hanging when used with remote connection (puppeteer.connect)

Open nemzyx opened this issue 1 year ago • 7 comments

Minimal, reproducible example

import puppeteer from 'puppeteer'

;(async()=>{
  const browser = await puppeteer.connect({
    browserURL: 'http://localhost:9222',
    defaultViewport: null,
  })
  const pages = await browser.pages()
  console.log(pages) // STUCK 😭
})()

Error string

no error

Background

Launch chrome with --remote-debugging-port=9222

chrome --remote-debugging-port=9222

On Mac:

/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222

OTHER PEOPLE ARE EXPERIENCING THIS ISSUE: https://stackoverflow.com/questions/77540484/browser-pages-does-not-resolve-in-puppeteer-script

Expectation

Return the pages array for all browser contexts. However, I also attempted doing it only for browser.defaultBrowserContext(), but it was the same bug in that case.

Reality

Hanging, stuck, no error messages. browser.pages() works when using puppeteer.launch()

Puppeteer version

21.7.0

Node version

21.5.0

Package manager

npm

Package manager version

10.2.5

Operating system

macOS

nemzyx avatar Jan 05 '24 17:01 nemzyx

The issue has been labeled as confirmed by the automatic analyser. Someone from the Puppeteer team will take a look soon!


Analyzer run

github-actions[bot] avatar Jan 05 '24 17:01 github-actions[bot]

What is the remote browser version?

OrKoN avatar Jan 08 '24 08:01 OrKoN

I am unable to reproduce with the version matching the version supported by Puppeteer.

OrKoN avatar Jan 08 '24 09:01 OrKoN

Google Chrome Version 120.0.6099.199 This is just my daily driver Chrome, and not the officially supported version. I was hoping i could somehow use my existing login sessions etc. on websites 😊

nemzyx avatar Jan 13 '24 04:01 nemzyx

Although we do not support that version with Puppeteer yet, think it should work. Could you reproduce it by launching the browser with the fresh profile? Could you provide a CDP log by running the script that connects to the browser with the env var DEBUG=puppeteer:*?

OrKoN avatar Jan 13 '24 08:01 OrKoN

same problem here.

chrome.exe is started by the following command line

"C:/Program Files/Google/Chrome/Application/chrome.exe" --remote-debugging-port=9290
let browser = await puppeteer.connect({
    browserURL: 'http://127.0.0.1:9290',
    defaultViewport: null,
    protocolTimeout: 5000
})

await browser.pages()  // throws "ProtocolError: Network.enable timed out."" after 5 seconds
ProtocolError: Network.enable timed out. Increase the 'protocolTimeout' setting in launch/connect calls for a higher timeout if needed.
    at <instance_members_initializer> (file:///D:/0/node_modules/.pnpm/[email protected][email protected]/node_modules/puppeteer-core/lib/esm/puppeteer/common/CallbackRegistry.js:92:14)
    at new Callback (file:///D:/0/node_modules/.pnpm/[email protected][email protected]/node_modules/puppeteer-core/lib/esm/puppeteer/common/CallbackRegistry.js:96:16)
    at CallbackRegistry.create (file:///D:/0/node_modules/.pnpm/[email protected][email protected]/node_modules/puppeteer-core/lib/esm/puppeteer/common/CallbackRegistry.js:19:26)
    at Connection._rawSend (file:///D:/0/node_modules/.pnpm/[email protected][email protected]/node_modules/puppeteer-core/lib/esm/puppeteer/cdp/Connection.js:77:26)
    at CdpCDPSession.send (file:///D:/0/node_modules/.pnpm/[email protected][email protected]/node_modules/puppeteer-core/lib/esm/puppeteer/cdp/CDPSession.js:63:33)
    at NetworkManager.addClient (file:///D:/0/node_modules/.pnpm/[email protected][email protected]/node_modules/puppeteer-core/lib/esm/puppeteer/cdp/NetworkManager.js:67:20)
    at FrameManager.initialize (file:///D:/0/node_modules/.pnpm/[email protected][email protected]/node_modules/puppeteer-core/lib/esm/puppeteer/cdp/FrameManager.js:176:38)
    at #initialize (file:///D:/0/node_modules/.pnpm/[email protected][email protected]/node_modules/puppeteer-core/lib/esm/puppeteer/cdp/Page.js:300:36)
    at CdpPage._create (file:///D:/0/node_modules/.pnpm/[email protected][email protected]/node_modules/puppeteer-core/lib/esm/puppeteer/cdp/Page.js:86:31)
    at file:///D:/0/node_modules/.pnpm/[email protected][email protected]/node_modules/puppeteer-core/lib/esm/puppeteer/cdp/Target.js:194:32

error log

puppeteer-core.stderr.log

version & env

"puppeteer-core": "^21.10.0",

windows 11 23h2 (22631.3007)

chrome 121.0.6167.140

ShenHongFei avatar Feb 02 '24 09:02 ShenHongFei

I then closed some tabs, restarted chrome, and tried again, and the problem disappeared.

ShenHongFei avatar Feb 02 '24 09:02 ShenHongFei

Same issue here. I need to reuse a chromium instance already opened and browser.pages() doesn't resolve. Any solutions?

jeangutemberg avatar May 12 '24 07:05 jeangutemberg

The issue was not reproducible. If you see it, you are likely using a not compatible Chrome version or some of the pages are blocked by alert dialogs (there is an issue for that).

OrKoN avatar May 16 '24 17:05 OrKoN

Same issue here. puppeteer 22.10.0

alxpereira avatar Jun 01 '24 06:06 alxpereira

Same issue puppeteer core 22.11.2 Screenshot 2024-06-09 at 02 16 06

thomascoding avatar Jun 19 '24 09:06 thomascoding

I am also experiencing this. The bug does go away when all Chrome instances are closed and reopened, as @ShenHongFei stated.

I have found a way to reproduce this issue. I was applying to some jobs on Indeed.com and I noticed that after submitting an application, Puppeteer would hang when you tried to await either browser.pages() or browser.newPage(), regardless of whether or not you restart the application.

Some important notes about applying to jobs on Indeed: The submission page for the job application process states that it is "Protected by reCAPTCHA" and the submit button can only be clicked after the CloudFlare turnstile captcha has verified that you are human. Visiting this page and having the captcha successfully verify that you are human seems to cause await browser.pages() and await browser.newPage() to hang indefinitely, regardless of whether an instance of Puppeteer was ever running on the browser.

I believe the issue is caused by whatever reCAPTCHA / CloudFlare turnstile does to the browser.

Gabriel-Bigelow avatar Jul 05 '24 21:07 Gabriel-Bigelow

I am able to reliably reproduce the error from original issue using a docker image & pdf rendering on an Alpine 3.20, but not on debian.

Here's my Dockerfile:

FROM alpine:3.20

WORKDIR /root

RUN apk add chromium npm

RUN npm install puppeteer

COPY <<EOF /root/input.html
<p>Hello, world!</p>
EOF

COPY <<EOF /root/script.js
const puppeteer = require('puppeteer')
 
async function printPDF() {
  const browser = await puppeteer.launch({
    headless: true,
    args: [
      "--no-sandbox",
      "--single-process",
      "--no-zygote",
      "--disable-gpu"
    ],
  });
  const page = await browser.newPage();
  await page.goto('file:///root/input.html', {waitUntil: 'networkidle0'});
  const pdf = await page.pdf({ path: "/root/output.pdf", format: 'A4' });
 
  await browser.close();
  return pdf
}

printPDF()
EOF

ENV PUPPETEER_EXECUTABLE_PATH=/usr/bin/chromium

Then:

sudo docker build --tag docker-chromium-issue .
sudo docker run -it --rm docker-chromium-issue node /root/script.js

The docker run command will hang for 2-3 minutes, then finish render an exception:

/root/node_modules/puppeteer-core/lib/cjs/puppeteer/common/CallbackRegistry.js:93
    #error = new Errors_js_1.ProtocolError();
             ^

ProtocolError: Network.enable timed out. Increase the 'protocolTimeout' setting in launch/connect calls for a higher timeout if needed.
    at <instance_members_initializer> (/root/node_modules/puppeteer-core/lib/cjs/puppeteer/common/CallbackRegistry.js:93:14)
    at new Callback (/root/node_modules/puppeteer-core/lib/cjs/puppeteer/common/CallbackRegistry.js:97:16)
    at CallbackRegistry.create (/root/node_modules/puppeteer-core/lib/cjs/puppeteer/common/CallbackRegistry.js:22:26)
    at Connection._rawSend (/root/node_modules/puppeteer-core/lib/cjs/puppeteer/cdp/Connection.js:89:26)
    at CdpCDPSession.send (/root/node_modules/puppeteer-core/lib/cjs/puppeteer/cdp/CDPSession.js:66:33)
    at NetworkManager.addClient (/root/node_modules/puppeteer-core/lib/cjs/puppeteer/cdp/NetworkManager.js:62:20)
    at FrameManager.initialize (/root/node_modules/puppeteer-core/lib/cjs/puppeteer/cdp/FrameManager.js:170:38)
    at #initialize (/root/node_modules/puppeteer-core/lib/cjs/puppeteer/cdp/Page.js:329:36)
    at CdpPage._create (/root/node_modules/puppeteer-core/lib/cjs/puppeteer/cdp/Page.js:95:31)
    at /root/node_modules/puppeteer-core/lib/cjs/puppeteer/cdp/Target.js:206:42

Node.js v21.7.3

While the command is running, I observe unusually high CPU usage by chromium process.

Updating the Dockerfile to be debian-based fixes the issue:

diff --git a/Dockerfile b/Dockerfile
index f025886..f706b48 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -1,8 +1,8 @@
-FROM alpine:3.20
+FROM debian:12

 WORKDIR /root

-RUN apk add chromium npm
+RUN apt update; apt install chromium npm --yes

 RUN npm install puppeteer

gmile avatar Jul 22 '24 15:07 gmile

In my case I was just able to resolve the issue with Alpine image byremoving --single-process flag from the launch function. The fix was suggested in this comment: https://github.com/puppeteer/puppeteer/issues/12189#issuecomment-2032593334

gmile avatar Jul 22 '24 16:07 gmile

We just started seeing this exact error today, but only on EC2 instances in AWS. We have a dockerized API server that's pulling in some assets from an S3 bucket. I've verified that a curl command DOES work, so it's not a network problem. And I've verified that the EC2 instance AND the docker container on that instance can see the stuff on the S3 bucket. This has been working for years, and just started throwing this error and not producing the expected PDF files last night.

We are not using --single-process, so that wasn't a fix for us. We ARE in the Alpine docker container, but I don't have a good way to test switching to Debian as this app is very extensive. I see this issue is closed, but we are still experiencing it, but again, not on local development environments, only on the test/stage/prod environments we have in AWS.

Any ideas?? We've tried so many things, even going back to the previous sprint release of our own code and database, which DID work as of yesterday, but today, no dice.

Any help is appreciate.

froggman2k avatar Jul 30 '24 20:07 froggman2k

We just started seeing this exact error today, but only on EC2 instances in AWS. We have a dockerized API server that's pulling in some assets from an S3 bucket. I've verified that a curl command DOES work, so it's not a network problem. And I've verified that the EC2 instance AND the docker container on that instance can see the stuff on the S3 bucket. This has been working for years, and just started throwing this error and not producing the expected PDF files last night.

We are not using --single-process, so that wasn't a fix for us. We ARE in the Alpine docker container, but I don't have a good way to test switching to Debian as this app is very extensive. I see this issue is closed, but we are still experiencing it, but again, not on local development environments, only on the test/stage/prod environments we have in AWS.

Any ideas?? We've tried so many things, even going back to the previous sprint release of our own code and database, which DID work as of yesterday, but today, no dice.

Any help is appreciate.

Additional context: We've narrowed it down to the newPage() method being what's hanging, so we haven't even tried to load our content when it hangs. Here's the code, just in case:

const browser = await puppeteer.launch({
    args: ['--no-sandbox'],
    timeout: 10000,
})

const page = await browser.newPage() // this is where it breaks

await page.setContent(content.replace(/>,</g, '><'))
const pdfOptions = {
    path: `generated_reports/${fileName}`,
    format: 'Letter'
}

await page.pdf(pdfOptions)

await browser.close()

froggman2k avatar Jul 30 '24 20:07 froggman2k

@froggman2k

We also faced this issue today. Our code is very similar to yours, setting up HTML and generating PDFs. We are also using the Alpine Node image.

We deployed a new Docker image today, which caused this issue, while the Docker image we deployed 10 days ago was working fine. Comparing the two images, we found that both use Alpine v3.20, but the version of Chromium in the community repository changed from chromium-126.0.6478.182-r0 to chromium-127.0.6533.72-r0. (You can see the related history at https://git.alpinelinux.org/aports/log/?h=3.20-stable&qt=grep&q=chromium)

We suspect that the change in the Chromium binary is causing this issue. We are planning to extract the APK file from the old Docker image and test it. Since we don't have much experience with APK packages, we are facing difficulties with offline installation due to the dependencies of chromium. I think you might want to try this approach as well. If this resolves the issue, it could indicate that the Chromium update in Alpine v3.20 is faulty.

stevejkang avatar Jul 31 '24 13:07 stevejkang

Note that technically Alpine Linux is not supported by Chrome (for Testing) which we provide with Puppeteer: https://support.google.com/chrome/a/answer/7100626?hl=en You would need 64-bit Ubuntu 18.04+, Debian 10+, openSUSE 15.5+, or Fedora Linux 38+. We provide a docker image based on Debian https://github.com/puppeteer/puppeteer/pkgs/container/puppeteer which is covered by our testing. To debug issues with Alpine Linux and chromium specifically, you can use dumpio: true in the launch args to see if some system dependencies are missing and then installing them. Here you can find system dependencies for supported distros https://source.chromium.org/chromium/chromium/src/+/main:chrome/installer/linux/debian/dist_package_versions.json

OrKoN avatar Jul 31 '24 13:07 OrKoN

By the time yesterday ended, Puppeteer was causing this network.enable error every 1-3 minutes, restarting NodeJS, with NOTHING calling it, just trying to load the project. Last night was a nightmare retrofitting and removing a bunch of stuff temporarily.

We've had so many of these difficult-to-find-and-resolve issues over the past 6 months with Puppeteer that we've decided to just scrap it and go a different direction. I'm sure it has it's uses in other situations, but for us we just can't tolerate being dependent on something that feels so flaky to our process and that has so few options for real debugging with how we are using it, when everything else is "owned" by us and is stable.

Cheers, all.

froggman2k avatar Jul 31 '24 18:07 froggman2k

Confirmed this is happening with current Alpine release

Downgrading from Alpine 3.20 to 3.19 fixed the issue.

JosephTico avatar Aug 01 '24 17:08 JosephTico

We have seen success in preventing this on alpine 3.20 by adding a --disable-gpu flag into the launch args, e.g.


(async () => {
  const puppeteer = require("puppeteer");
  // Launch the browser and open a new blank page
  const browser = await puppeteer.launch({args:['--no-sandbox']});
  const page = await browser.newPage();
  const version = await page.browser().version();
  console.log("page browser version: " + version);
  await page.goto('https://bbc.co.uk/news');

  console.log("Page title: " + await page.title());

  await browser.close();
})();

Will time out and fail with ProtocolError: Network.enable timed out

Whereas, adding the flag, e.g.


(async () => {
  const puppeteer = require("puppeteer");
  // Launch the browser and open a new blank page
  const browser = await puppeteer.launch({args:['--no-sandbox --disable-gpu']});
  const page = await browser.newPage();
  const version = await page.browser().version();
  console.log("page browser version: " + version);
  await page.goto('https://bbc.co.uk/news');

  console.log("Page title: " + await page.title());

  const element = await page.waitForSelector('h1')
  let value = await element.evaluate(el => el.textContent)
  console.log('h1 = ' + value);

  await browser.close();
})();

Outputs the expected results, we are currently assuming that chromium updated something GPU related in 127+ but are too busy trying test the fix!

CeeBeeUK avatar Aug 02 '24 08:08 CeeBeeUK

related issue from alpine aports

stevejkang avatar Aug 05 '24 06:08 stevejkang

We have seen success in preventing this on alpine 3.20 by adding a --disable-gpu flag into the launch args, e.g.

Thanks for mentioning this workaround. While locally and with kvm there were no problems with chrome 127.x (Ubuntu based), k8s pods did show this error (debian based image) - "disable-gpu" did help here too.

tkrah avatar Aug 07 '24 11:08 tkrah