puppeteer
puppeteer copied to clipboard
Intercept target creation
In many ways, users want to intercept targets being created to attach and set them up.
Usecases:
- when popup is getting opened, attach to it and enable request interception
- when a link click opens a new page, set proper device emulation before website is getting loaded
We might be able to do this with CDP using Target.setAutoAttach
and waitForDebugger
option. I'd like this to be scoped to browser context though so that's there's a better flexibility.
The API might look like this:
await browserContext.setTargetInterception(true);
browserContext.on('targetcreated', async target => {
if (target.type() !== 'page') {
await target.resumeLoading();
return;
}
const page = await target.page();
await page.setViewport({width: 400, height: 400});
await target.resumeLoading();
});
Related issues: #1378, #3648
Is there any workaround for this problem in mean time ? I tried to apply stealth using extension instead of puppeteer-extra-plugin-stealth
but it not work for new tab opened implicit, right now I testing intercept request that create new tab, blocking and save url, headers data then recreate new tab apply conceal then goto using that data
I am sorry this might be a silly question but can you show how browserContext
is called. I did something like this
const browserContext = await browser.defaultBrowserContext()
//Error: browserContext.setTargetInterception is not a function
await browserContext.setTargetInterception(true);
What is the right way of defining browserContext
?
I am sorry this might be a silly question but can you show how
browserContext
is called. I did something like thisconst browserContext = await browser.defaultBrowserContext() //Error: browserContext.setTargetInterception is not a function await browserContext.setTargetInterception(true);
What is the right way of defining
browserContext
?
It's not a working code, the method is a feature request that not yet be implemented:
look the OP's word:
The API might look like this:
@swang2019 hahaha. Thanks. I get it. I can't wait for this feature to be out.
Is there an ETA on a fix for this? Browsing pages I do not control, I am watching for the targetcreated
event in order to inspect popups. When a popup occurs, there doesn't seem to be any way for me to inspect the network requests. On main page, I can add a handler for request
and requestfinished
prior to a page.goto
which means every network request for that page (even the initial one) can be inspected. However, if I attach request
or requestfinished
event handlers on the popup page, it's already too late; the page has already started loading and the events never fire (maybe I'll see some if the page takes a long time to load a bunch of resources, but never the initial one).
After the fact, I can get the URLs of things that loaded by inspecting the trace file, but it's difficult to pinpoint only the ones loaded by the popup. In addition, I don't have access to all the other goodies provided by the Request object (headers, content, etc.).
A fix or any workaround suggestion would be greatly appreciated!
Hey. Planning to add this functionality?
I'm playing with the CDP protocol a little bit in this regard and got a rough prototype working (intercept implicitly created targets, attach scripts or modify headers and continue loading the page).
My main issue is the "continue loading" bit: When emitting Runtime.runIfWaitingForDebugger
the (in my case) Popup is still stuck in "Waiting.." and I need to send a Page.reload
command to make the popup load after the modifications (when turning on waitForDebuggerOnStart: true
).
The result is empty, so I'm not sure that's successful (I tried adding targetId
param to no avail):
puppeteer:protocol SEND ► {"sessionId":"B03359C06B0C7ABD2FFD363B70C8531A","method":"Runtime.runIfWaitingForDebugger","params":{},"id":29} +0ms
puppeteer:protocol ◀ RECV {"id":29,"result":{},"sessionId":"B03359C06B0C7ABD2FFD363B70C8531A"} +1ms
I'm probably using it incorrectly but unfortunately I can't really find documentation or code snippets for this.
If anyone knows if I'm missing a crucial step here to make the page load please tell me :) I'll continue playing with this and see if I can make a plugin for puppeteer-extra
out of this (and also to use it in the stealth plugin).
Here are the main bits (very much WIP):
const puppeteer = require("puppeteer-extra")
const { CDPSession } = require("puppeteer/lib/Connection")
puppeteer
.launch({
headless: false
})
.then(async browser => {
const page = await browser.newPage()
await page._client.send("Target.setAutoAttach", {
autoAttach: true,
flatten: true,
windowOpen: true,
waitForDebuggerOnStart: true // is set to false in pptr
})
page._client.on(
"Target.attachedToTarget",
async event => {
if (event.waitingForDebugger) {
console.log("continue")
const newConn = await browser._connection.createSession(
event.targetInfo
)
await newConn.send("Network.setExtraHTTPHeaders", {
headers: { "x-foo": "hi-there" }
})
await newConn.send("Runtime.enable")
await newConn.send("Page.setLifecycleEventsEnabled", {
enabled: true
})
await newConn.send("Network.enable")
await newConn.send("Page.enable")
await newConn.send("Runtime.runIfWaitingForDebugger")
console.log("all sent")
await newConn.send("Page.reload") // Will not load without
}
}
)
await page.goto(
"file:///temp/popup.html" // HTML page that will open a popup
)
})
Custom x-foo
header is present in the popup created through window.open
:
That looks like good progress @berstend. I checked and if you call await newConn.send("Page.stopLoading")
first thing in attachedToTarget it doesn't seem to be sending out requests double, so it's not too bad.
By the way the relevant issue in the Chrome Devtools issue tracker is: https://github.com/ChromeDevTools/devtools-protocol/issues/77
I checked and if you call await newConn.send("Page.stopLoading") first thing in attachedToTarget it doesn't seem to be sending out requests double, so it's not too bad.
That's good to know. Still haven't found a "proper" way to make the target continue loading after we've intercepted it though.
I'll be looking into testing the various .continue
CDP triggers to see if they might do the trick here. :)
Playwright seems to have worked around this/provided a fix: https://github.com/microsoft/playwright/pull/1226
@cris691 Playwright comes with a Chromium nightly build, so it's hard to say if they fixed it in the Playwright code or if Chromium had a patch. But it seems the bug is still present when testing in Puppeteer with the latest Chromium.
Overwriting window.open is no solution, the same issue happens with regular links with target=_blank
. @berstend's solution seems like the way to go to get it fixed without depending on JS hacks.
Have you used Playwright? It's huge hacks to write the issue is fixed if not. I wasn't saying you were wrong, just seemed like you'd read the issue, hadn't used Playwright to see if still has the issue. Use it to prove it or it's no solution.
Not sure what you're trying to say, but yes - I tried in Playwright and it works there. Here's a simple test case I submitted to the Chromium bug tracker: https://bugs.chromium.org/p/chromium/issues/attachmentText?aid=447951 (works directly on CDP protocol).
Is this still not achieved?
This is ridiculous that such obvious thing just undoable. Device simulation with multiple tabs is just unusable without this :(
I'm playing with the CDP protocol a little bit in this regard and got a rough prototype working (intercept implicitly created targets, attach scripts or modify headers and continue loading the page).
My main issue is the "continue loading" bit: When emitting
Runtime.runIfWaitingForDebugger
the (in my case) Popup is still stuck in "Waiting.." and I need to send aPage.reload
command to make the popup load after the modifications (when turning onwaitForDebuggerOnStart: true
).The result is empty, so I'm not sure that's successful (I tried adding
targetId
param to no avail):puppeteer:protocol SEND ► {"sessionId":"B03359C06B0C7ABD2FFD363B70C8531A","method":"Runtime.runIfWaitingForDebugger","params":{},"id":29} +0ms puppeteer:protocol ◀ RECV {"id":29,"result":{},"sessionId":"B03359C06B0C7ABD2FFD363B70C8531A"} +1ms
I'm probably using it incorrectly but unfortunately I can't really find documentation or code snippets for this.
If anyone knows if I'm missing a crucial step here to make the page load please tell me :) I'll continue playing with this and see if I can make a plugin for
puppeteer-extra
out of this (and also to use it in the stealth plugin).Here are the main bits (very much WIP):
const puppeteer = require("puppeteer-extra") const { CDPSession } = require("puppeteer/lib/Connection") puppeteer .launch({ headless: false }) .then(async browser => { const page = await browser.newPage() await page._client.send("Target.setAutoAttach", { autoAttach: true, flatten: true, windowOpen: true, waitForDebuggerOnStart: true // is set to false in pptr }) page._client.on( "Target.attachedToTarget", async event => { if (event.waitingForDebugger) { console.log("continue") const newConn = await browser._connection.createSession( event.targetInfo ) await newConn.send("Network.setExtraHTTPHeaders", { headers: { "x-foo": "hi-there" } }) await newConn.send("Runtime.enable") await newConn.send("Page.setLifecycleEventsEnabled", { enabled: true }) await newConn.send("Network.enable") await newConn.send("Page.enable") await newConn.send("Runtime.runIfWaitingForDebugger") console.log("all sent") await newConn.send("Page.reload") // Will not load without } } ) await page.goto( "file:///temp/popup.html" // HTML page that will open a popup ) })
Custom
x-foo
header is present in the popup created throughwindow.open
:
———————————————————————————————————————————————————————— 【Now, is there a better solution?】 I have an immature idea. Maybe you can use a proxy, args:[--proxy-server]. Each requested url must carry a parameter: 【pageId】. If there is no pageid, the proxy will pending the request. After "targetcreated" is triggered, notify the proxy , cancel pending, and return to 302 redirection. Future requests can carry the pageId parameter
Everyone who's interested in seeing this fixed: Please star this chromium bug reported by Niek, so the chromium devs know there's strong interest in getting this resolved - thanks!
https://bugs.chromium.org/p/chromium/issues/detail?id=1070568
👋 the situation is bit unclear in regards to the Chromium bug, but I think it has been solved and that the fix done by Playwright could be backported to Puppeteer, can this be considered?
In the meantime, the following code was successful, but makes things more complex and you cannot directly operate with the Page
object, you have to work directly with manual devtool operations:
const browser = await puppeteer.launch()
const mainConnection = this.browser['_connection']
await mainConnection.send('Target.setDiscoverTargets', {discover: true})
await mainConnection.send('Target.setAutoAttach', {autoAttach: true, waitForDebuggerOnStart: true, flatten: true})
mainConnection.on('Target.attachedToTarget', async (event: puppeteer.Protocol.Target.AttachedToTargetEvent) => {
if (!event.waitingForDebugger) {
return
}
const targetSession = mainConnection._sessions.get(event.sessionId)
if (!targetSession) {
return
}
// Have to be in a `Promise.all` otherwise anything sent before `runIfWaitingForDebugger` will hang.
await Promise.all([
// Put here you custom instructions you want for this new target, for instance injecting a script first.
targetSession.send('Page.addScriptToEvaluateOnNewDocument', {source: "your_script"}),
targetSession.send('Page.enable'),
// Always need to send that.
targetSession.send('Runtime.runIfWaitingForDebugger'),
])
})
Inspired by sdeprez, I find a hack, it works.
const puppeteer = require('puppeteer');
const assert = require('assert');
(async () => {
const browser = await puppeteer.launch({
headless: false,
defaultViewport: null,
});
await hackBrowserTargets(browser);
const page1 = (await browser.pages())[0];
await setInterception(page1);
await page1.goto('http://127.0.0.1/page');
const $linkOfPage1 = await page1.$('#link')
assert.notEqual($linkOfPage1, null);
await $linkOfPage1.click();
const newTraget = await browser.waitForTarget((target) => {
return Boolean(target._targetInfo.openerId);
});
const page2 = await newTraget.page();
await setInterception(page2);
await page2.reload();
const $linkOfPage2 = await page2.$('#link')
assert.notEqual($linkOfPage2, null);
await browser.close();
})();
async function setInterception(page) {
await page.setRequestInterception(true);
page.on('request', (request) => {
if (request.url().includes('/page')) {
request.respond({
status: 200,
headers: {
'Content-Type': 'text/html',
},
body: `
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<title>Page</title>
</head>
<body>
<a id="link" href="/page" target="_blank">new page</a>
</body>
</html>
`,
});
} else {
request.continue();
}
});
}
async function hackBrowserTargets(browser) {
await browser._connection.send('Target.setAutoAttach', {
autoAttach: true,
waitForDebuggerOnStart: true,
flatten: true,
});
browser._connection.on('Target.attachedToTarget', async (event) => {
const session = browser._connection._sessions.get(event.sessionId);
if (!event.waitingForDebugger || !session) {
return
}
if (event.targetInfo.type === 'page' && event.targetInfo.openerId) {
const handleRequestPaused = async (event) => {
await Promise.all([
session.send('Fetch.fulfillRequest', {
requestId: event.requestId,
responseCode: 200,
responsePhrase: 'OK',
responseHeaders: [
{
name: 'content-type',
value: 'text/html'
},
{
name: 'content-length',
value: '0'
}
],
body: ''
}),
session.send('Fetch.disable')
]);
session.off('Fetch.requestPaused', handleRequestPaused);
};
session.on('Fetch.requestPaused', handleRequestPaused);
await session.send('Fetch.enable', {
handleAuthRequests: true,
patterns: [{ urlPattern: '*' }],
});
}
await session.send('Runtime.runIfWaitingForDebugger');
});
}
A temporary workaround until this issue is fixed, which I believe might be useful for some of you.
To customize the user-agent in headless mode (and apply it on any new target), set Puppeteer's headless
property to false
and instead launch the browser with the --headless
flag together with your anonymized UA using the --user-agent
flag.
import * as puppeteer from 'puppeteer';
const browser = await puppeteer.launch({
headless: false,
args: [
'--headless',
'--user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
],
defaultViewport: { width: 1920, height: 900 },
});
hi @Junyan , it seems that newer versions of pupeteer now hide the browser._connection
field, as it is private.
do you know a workaroud?
I could hack it with:
const { CDPBrowser } = require('../node_modules/puppeteer/node_modules/puppeteer-core/lib/cjs/puppeteer/common/Browser.js');
.
.
.
let browserConnection;
const oldCreate = CDPBrowser._create;
CDPBrowser._create = (product, connection, ...args) => {
browserConnection = connection;
return oldCreate(product, connection, ...args);
}
it would really be great if this could be done properly.
hi @Junyan , it seems that newer versions of pupeteer now hide the
browser._connection
field, as it is private. do you know a workaroud?I could hack it with:
const { CDPBrowser } = require('../node_modules/puppeteer/node_modules/puppeteer-core/lib/cjs/puppeteer/common/Browser.js'); . . . let browserConnection; const oldCreate = CDPBrowser._create; CDPBrowser._create = (product, connection, ...args) => { browserConnection = connection; return oldCreate(product, connection, ...args); }
it would really be great if this could be done properly.
@tripodsan It looks like a solution.. I don't have any new ideas because we have not upgraded puppeteer.
The proposal in this issue should now be possible if anyone wants to try to implement it. in ChromeTargetManager it is possible to safely await until the target is is configured by the user code.
Have puppeteer any possibility to catch which scripts call new tab open for catching popups?
I have found a solution for intercepting popups using CDP in the latest version of puppeteer for my extremely specific use case, but hopefully this is helpful for others. Essentially, what you need to do is:
- Start a CDP session for the browser target
- Call
Target.setAutoAttach
with the filter{ type: 'page', exclude: false }
- On
Target.attachedToTarget
, get the puppeteerConnection
object from the CDP session for the browser target, and then get the CDP session of the popup target using thesessionId
field from the attachedToTarget event - Do whatever you need to do here to set up the tab, using the popup's CDP session
- Call
Runtime.runIfWaitingForDebugger
in the popup's CDP session to resume the tab
Here's some example code for my use case - I wanted to see what URL a popup is trying to go to, without letting it actually load the popup:
import puppeteer from 'puppeteer';
let browser = await puppeteer.launch({
headless: false
});
const page = (await browser.pages())[0];
// Set this to a URL where you can click on something that opens a popup
await page.goto('https://www.example.com');
const cdp = await browser.target().createCDPSession();
// Set up auto-attaching, set the filter to pages only
await cdp.send('Target.setAutoAttach', {
waitForDebuggerOnStart: true,
autoAttach: true,
flatten: true,
filter: [
{ type: 'page', exclude: false },
]
})
cdp.on('Target.attachedToTarget', async ({sessionId, targetInfo, waitingForDebugger}) => {
// Get the puppeteer Connection object from the existing CDP connection
let connection = cdp.connection();
if (!connection) {
console.log('Could not get connection');
return;
}
// Get the CDP session that has attached to the popup
let popupCdp = connection.session(sessionId);
if (!popupCdp) {
console.log('Could not get popup CDP session');
return;
}
// Do whatever setup you want to do in the popup here
await popupCdp.send('Fetch.enable');
popupCdp.on('Fetch.requestPaused', async ({requestId, request}) => {
console.log('Intercepted popup - attempted to open URL:', request.url);
popupCdp?.send('Fetch.failRequest', {requestId, errorReason: 'Aborted'});
});
// Allow the popup to continue executing
await popupCdp.send('Runtime.runIfWaitingForDebugger');
});
console.log('ready');
// Click on something that opens a popup
I'm playing with the CDP protocol a little bit in this regard and got a rough prototype working (intercept implicitly created targets, attach scripts or modify headers and continue loading the page).
My main issue is the "continue loading" bit: When emitting
Runtime.runIfWaitingForDebugger
the (in my case) Popup is still stuck in "Waiting.." and I need to send aPage.reload
command to make the popup load after the modifications (when turning onwaitForDebuggerOnStart: true
).The result is empty, so I'm not sure that's successful (I tried adding
targetId
param to no avail):puppeteer:protocol SEND ► {"sessionId":"B03359C06B0C7ABD2FFD363B70C8531A","method":"Runtime.runIfWaitingForDebugger","params":{},"id":29} +0ms puppeteer:protocol ◀ RECV {"id":29,"result":{},"sessionId":"B03359C06B0C7ABD2FFD363B70C8531A"} +1ms
I'm probably using it incorrectly but unfortunately I can't really find documentation or code snippets for this.
If anyone knows if I'm missing a crucial step here to make the page load please tell me :) I'll continue playing with this and see if I can make a plugin for
puppeteer-extra
out of this (and also to use it in the stealth plugin).Here are the main bits (very much WIP):
const puppeteer = require("puppeteer-extra") const { CDPSession } = require("puppeteer/lib/Connection") puppeteer .launch({ headless: false }) .then(async browser => { const page = await browser.newPage() await page._client.send("Target.setAutoAttach", { autoAttach: true, flatten: true, windowOpen: true, waitForDebuggerOnStart: true // is set to false in pptr }) page._client.on( "Target.attachedToTarget", async event => { if (event.waitingForDebugger) { console.log("continue") const newConn = await browser._connection.createSession( event.targetInfo ) await newConn.send("Network.setExtraHTTPHeaders", { headers: { "x-foo": "hi-there" } }) await newConn.send("Runtime.enable") await newConn.send("Page.setLifecycleEventsEnabled", { enabled: true }) await newConn.send("Network.enable") await newConn.send("Page.enable") await newConn.send("Runtime.runIfWaitingForDebugger") console.log("all sent") await newConn.send("Page.reload") // Will not load without } } ) await page.goto( "file:///temp/popup.html" // HTML page that will open a popup ) })
Custom
x-foo
header is present in the popup created throughwindow.open
:
seems like Target.attachedToTarget
events only get fired when Target.setDiscoverTargets
is enabled.
However, I get <code=-32000 message=Not allowed>
at Target.setDiscoverTargets, {"discover":true}