puppeteer icon indicating copy to clipboard operation
puppeteer copied to clipboard

Intercept target creation

Open aslushnikov opened this issue 6 years ago • 26 comments

In many ways, users want to intercept targets being created to attach and set them up.

Usecases:

  • when popup is getting opened, attach to it and enable request interception
  • when a link click opens a new page, set proper device emulation before website is getting loaded

We might be able to do this with CDP using Target.setAutoAttach and waitForDebugger option. I'd like this to be scoped to browser context though so that's there's a better flexibility.

The API might look like this:

await browserContext.setTargetInterception(true);
browserContext.on('targetcreated', async target => {
  if (target.type() !== 'page') {
    await target.resumeLoading();
    return;
  }
  const page = await target.page();
  await page.setViewport({width: 400, height: 400});
  await target.resumeLoading();
});

Related issues: #1378, #3648

aslushnikov avatar Dec 13 '18 21:12 aslushnikov

Is there any workaround for this problem in mean time ? I tried to apply stealth using extension instead of puppeteer-extra-plugin-stealth but it not work for new tab opened implicit, right now I testing intercept request that create new tab, blocking and save url, headers data then recreate new tab apply conceal then goto using that data

nemesisqp avatar Jan 11 '19 09:01 nemesisqp

I am sorry this might be a silly question but can you show how browserContext is called. I did something like this

const browserContext = await browser.defaultBrowserContext()
//Error: browserContext.setTargetInterception is not a function
await browserContext.setTargetInterception(true);

What is the right way of defining browserContext?

rotimi-best avatar Feb 28 '19 17:02 rotimi-best

I am sorry this might be a silly question but can you show how browserContext is called. I did something like this

const browserContext = await browser.defaultBrowserContext()
//Error: browserContext.setTargetInterception is not a function
await browserContext.setTargetInterception(true);

What is the right way of defining browserContext?

It's not a working code, the method is a feature request that not yet be implemented:

look the OP's word:

The API might look like this:

swang2019 avatar Feb 28 '19 17:02 swang2019

@swang2019 hahaha. Thanks. I get it. I can't wait for this feature to be out.

rotimi-best avatar Feb 28 '19 17:02 rotimi-best

Is there an ETA on a fix for this? Browsing pages I do not control, I am watching for the targetcreated event in order to inspect popups. When a popup occurs, there doesn't seem to be any way for me to inspect the network requests. On main page, I can add a handler for request and requestfinished prior to a page.goto which means every network request for that page (even the initial one) can be inspected. However, if I attach request or requestfinished event handlers on the popup page, it's already too late; the page has already started loading and the events never fire (maybe I'll see some if the page takes a long time to load a bunch of resources, but never the initial one).

After the fact, I can get the URLs of things that loaded by inspecting the trace file, but it's difficult to pinpoint only the ones loaded by the popup. In addition, I don't have access to all the other goodies provided by the Request object (headers, content, etc.).

A fix or any workaround suggestion would be greatly appreciated!

saberworks avatar Jun 18 '19 18:06 saberworks

Hey. Planning to add this functionality?

alex2844 avatar Aug 29 '19 05:08 alex2844

I'm playing with the CDP protocol a little bit in this regard and got a rough prototype working (intercept implicitly created targets, attach scripts or modify headers and continue loading the page).

My main issue is the "continue loading" bit: When emitting Runtime.runIfWaitingForDebugger the (in my case) Popup is still stuck in "Waiting.." and I need to send a Page.reload command to make the popup load after the modifications (when turning on waitForDebuggerOnStart: true).

The result is empty, so I'm not sure that's successful (I tried adding targetId param to no avail):

  puppeteer:protocol SEND ► {"sessionId":"B03359C06B0C7ABD2FFD363B70C8531A","method":"Runtime.runIfWaitingForDebugger","params":{},"id":29} +0ms
  puppeteer:protocol ◀ RECV {"id":29,"result":{},"sessionId":"B03359C06B0C7ABD2FFD363B70C8531A"} +1ms

I'm probably using it incorrectly but unfortunately I can't really find documentation or code snippets for this.

If anyone knows if I'm missing a crucial step here to make the page load please tell me :) I'll continue playing with this and see if I can make a plugin for puppeteer-extra out of this (and also to use it in the stealth plugin).

Here are the main bits (very much WIP):

const puppeteer = require("puppeteer-extra")
const { CDPSession } = require("puppeteer/lib/Connection")

puppeteer
  .launch({
    headless: false
  })
  .then(async browser => {
    const page = await browser.newPage()

    await page._client.send("Target.setAutoAttach", {
      autoAttach: true,
      flatten: true,
      windowOpen: true,
      waitForDebuggerOnStart: true // is set to false in pptr
    })

    page._client.on(
      "Target.attachedToTarget",
      async event => {

        if (event.waitingForDebugger) {
          console.log("continue")

          const newConn = await browser._connection.createSession(
            event.targetInfo
          )

          await newConn.send("Network.setExtraHTTPHeaders", {
            headers: { "x-foo": "hi-there" }
          })
          await newConn.send("Runtime.enable")
          await newConn.send("Page.setLifecycleEventsEnabled", {
            enabled: true
          })
          await newConn.send("Network.enable")
          await newConn.send("Page.enable")

          await newConn.send("Runtime.runIfWaitingForDebugger")
          console.log("all sent")
          await newConn.send("Page.reload") // Will not load without
        }
      }
    )

    await page.goto(
      "file:///temp/popup.html" // HTML page that will open a popup
    )
  })

Custom x-foo header is present in the popup created through window.open:

image

berstend avatar Dec 05 '19 19:12 berstend

That looks like good progress @berstend. I checked and if you call await newConn.send("Page.stopLoading") first thing in attachedToTarget it doesn't seem to be sending out requests double, so it's not too bad.

By the way the relevant issue in the Chrome Devtools issue tracker is: https://github.com/ChromeDevTools/devtools-protocol/issues/77

Niek avatar Dec 12 '19 11:12 Niek

I checked and if you call await newConn.send("Page.stopLoading") first thing in attachedToTarget it doesn't seem to be sending out requests double, so it's not too bad.

That's good to know. Still haven't found a "proper" way to make the target continue loading after we've intercepted it though.

I'll be looking into testing the various .continue CDP triggers to see if they might do the trick here. :)

berstend avatar Dec 14 '19 17:12 berstend

Playwright seems to have worked around this/provided a fix: https://github.com/microsoft/playwright/pull/1226

Niek avatar Apr 09 '20 11:04 Niek

@cris691 Playwright comes with a Chromium nightly build, so it's hard to say if they fixed it in the Playwright code or if Chromium had a patch. But it seems the bug is still present when testing in Puppeteer with the latest Chromium.

Overwriting window.open is no solution, the same issue happens with regular links with target=_blank. @berstend's solution seems like the way to go to get it fixed without depending on JS hacks.

Niek avatar Jul 11 '20 09:07 Niek

Have you used Playwright? It's huge hacks to write the issue is fixed if not. I wasn't saying you were wrong, just seemed like you'd read the issue, hadn't used Playwright to see if still has the issue. Use it to prove it or it's no solution.

Not sure what you're trying to say, but yes - I tried in Playwright and it works there. Here's a simple test case I submitted to the Chromium bug tracker: https://bugs.chromium.org/p/chromium/issues/attachmentText?aid=447951 (works directly on CDP protocol).

Niek avatar Jul 13 '20 08:07 Niek

Is this still not achieved?

rookiefront avatar Jul 17 '20 10:07 rookiefront

This is ridiculous that such obvious thing just undoable. Device simulation with multiple tabs is just unusable without this :(

terion-name avatar Jul 17 '20 19:07 terion-name

I'm playing with the CDP protocol a little bit in this regard and got a rough prototype working (intercept implicitly created targets, attach scripts or modify headers and continue loading the page).

My main issue is the "continue loading" bit: When emitting Runtime.runIfWaitingForDebugger the (in my case) Popup is still stuck in "Waiting.." and I need to send a Page.reload command to make the popup load after the modifications (when turning on waitForDebuggerOnStart: true).

The result is empty, so I'm not sure that's successful (I tried adding targetId param to no avail):

  puppeteer:protocol SEND ► {"sessionId":"B03359C06B0C7ABD2FFD363B70C8531A","method":"Runtime.runIfWaitingForDebugger","params":{},"id":29} +0ms
  puppeteer:protocol ◀ RECV {"id":29,"result":{},"sessionId":"B03359C06B0C7ABD2FFD363B70C8531A"} +1ms

I'm probably using it incorrectly but unfortunately I can't really find documentation or code snippets for this.

If anyone knows if I'm missing a crucial step here to make the page load please tell me :) I'll continue playing with this and see if I can make a plugin for puppeteer-extra out of this (and also to use it in the stealth plugin).

Here are the main bits (very much WIP):

const puppeteer = require("puppeteer-extra")
const { CDPSession } = require("puppeteer/lib/Connection")

puppeteer
  .launch({
    headless: false
  })
  .then(async browser => {
    const page = await browser.newPage()

    await page._client.send("Target.setAutoAttach", {
      autoAttach: true,
      flatten: true,
      windowOpen: true,
      waitForDebuggerOnStart: true // is set to false in pptr
    })

    page._client.on(
      "Target.attachedToTarget",
      async event => {

        if (event.waitingForDebugger) {
          console.log("continue")

          const newConn = await browser._connection.createSession(
            event.targetInfo
          )

          await newConn.send("Network.setExtraHTTPHeaders", {
            headers: { "x-foo": "hi-there" }
          })
          await newConn.send("Runtime.enable")
          await newConn.send("Page.setLifecycleEventsEnabled", {
            enabled: true
          })
          await newConn.send("Network.enable")
          await newConn.send("Page.enable")

          await newConn.send("Runtime.runIfWaitingForDebugger")
          console.log("all sent")
          await newConn.send("Page.reload") // Will not load without
        }
      }
    )

    await page.goto(
      "file:///temp/popup.html" // HTML page that will open a popup
    )
  })

Custom x-foo header is present in the popup created through window.open:

image

———————————————————————————————————————————————————————— 【Now, is there a better solution?】 I have an immature idea. Maybe you can use a proxy, args:[--proxy-server]. Each requested url must carry a parameter: 【pageId】. If there is no pageid, the proxy will pending the request. After "targetcreated" is triggered, notify the proxy , cancel pending, and return to 302 redirection. Future requests can carry the pageId parameter

huanshiwushuang avatar Sep 16 '20 08:09 huanshiwushuang

Everyone who's interested in seeing this fixed: Please star this chromium bug reported by Niek, so the chromium devs know there's strong interest in getting this resolved - thanks!

https://bugs.chromium.org/p/chromium/issues/detail?id=1070568

berstend avatar Oct 15 '20 14:10 berstend

👋  the situation is bit unclear in regards to the Chromium bug, but I think it has been solved and that the fix done by Playwright could be backported to Puppeteer, can this be considered?

In the meantime, the following code was successful, but makes things more complex and you cannot directly operate with the Page object, you have to work directly with manual devtool operations:

const browser = await puppeteer.launch()

const mainConnection = this.browser['_connection']
await mainConnection.send('Target.setDiscoverTargets', {discover: true})
await mainConnection.send('Target.setAutoAttach', {autoAttach: true, waitForDebuggerOnStart: true, flatten: true})

mainConnection.on('Target.attachedToTarget', async (event: puppeteer.Protocol.Target.AttachedToTargetEvent) => {
  if (!event.waitingForDebugger) {
    return
  }

  const targetSession = mainConnection._sessions.get(event.sessionId)
  if (!targetSession) {
    return
  }

  // Have to be in a `Promise.all` otherwise anything sent before `runIfWaitingForDebugger` will hang.
  await Promise.all([
    // Put here you custom instructions you want for this new target, for instance injecting a script first.
    targetSession.send('Page.addScriptToEvaluateOnNewDocument', {source: "your_script"}),
    targetSession.send('Page.enable'),
    // Always need to send that.
    targetSession.send('Runtime.runIfWaitingForDebugger'),
  ])
})

sdeprez avatar Nov 10 '21 09:11 sdeprez

Inspired by sdeprez, I find a hack, it works.

const puppeteer = require('puppeteer');
const assert = require('assert');

(async () => {
  const browser = await puppeteer.launch({
    headless: false,
    defaultViewport: null,
  });
  await hackBrowserTargets(browser);

  const page1 = (await browser.pages())[0];
  await setInterception(page1);
  await page1.goto('http://127.0.0.1/page');
  const $linkOfPage1 = await page1.$('#link')
  assert.notEqual($linkOfPage1, null);
  await $linkOfPage1.click();

  const newTraget = await browser.waitForTarget((target) => {
    return Boolean(target._targetInfo.openerId);
  });
  const page2 = await newTraget.page();
  await setInterception(page2);
  await page2.reload();
  const $linkOfPage2 = await page2.$('#link')
  assert.notEqual($linkOfPage2, null);

  await browser.close();
})();

async function setInterception(page) {
  await page.setRequestInterception(true);

  page.on('request', (request) => {
    if (request.url().includes('/page')) {
      request.respond({
        status: 200,
        headers: {
          'Content-Type': 'text/html',
        },
        body: `
          <!DOCTYPE html>
          <html lang="en">
          <head>
            <meta charset="UTF-8">
            <meta http-equiv="X-UA-Compatible" content="IE=edge">
            <title>Page</title>
          </head>
          <body>
            <a id="link" href="/page" target="_blank">new page</a>
          </body>
          </html>
        `,
      });
    } else {
      request.continue();
    }
  });
}

async function hackBrowserTargets(browser) {
  await browser._connection.send('Target.setAutoAttach', {
    autoAttach: true,
    waitForDebuggerOnStart: true,
    flatten: true,
  });
  browser._connection.on('Target.attachedToTarget', async (event) => {
    const session = browser._connection._sessions.get(event.sessionId);

    if (!event.waitingForDebugger || !session) {
      return
    }

    if (event.targetInfo.type === 'page' && event.targetInfo.openerId) {
      const handleRequestPaused = async (event) => {
        await Promise.all([
          session.send('Fetch.fulfillRequest', {
            requestId: event.requestId,
            responseCode: 200,
            responsePhrase: 'OK',
            responseHeaders: [
              {
                name: 'content-type',
                value: 'text/html'
              },
              {
                name: 'content-length',
                value: '0'
              }
            ],
            body: ''
          }),
          session.send('Fetch.disable')
        ]);
  
        session.off('Fetch.requestPaused', handleRequestPaused);
      };
      session.on('Fetch.requestPaused', handleRequestPaused);
  
      await session.send('Fetch.enable', {
        handleAuthRequests: true,
        patterns: [{ urlPattern: '*' }],
      });
    }

    await session.send('Runtime.runIfWaitingForDebugger');
  });
}

Junyan avatar Jun 28 '22 04:06 Junyan

A temporary workaround until this issue is fixed, which I believe might be useful for some of you. To customize the user-agent in headless mode (and apply it on any new target), set Puppeteer's headless property to false and instead launch the browser with the --headless flag together with your anonymized UA using the --user-agent flag.

import * as puppeteer from 'puppeteer';

const browser = await puppeteer.launch({
  headless: false,
  args: [
    '--headless',
    '--user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
  ],
  defaultViewport: { width: 1920, height: 900 },
});

almog-geva avatar Aug 03 '22 09:08 almog-geva

hi @Junyan , it seems that newer versions of pupeteer now hide the browser._connection field, as it is private. do you know a workaroud?

I could hack it with:

const { CDPBrowser } = require('../node_modules/puppeteer/node_modules/puppeteer-core/lib/cjs/puppeteer/common/Browser.js');
.
.
.
  let browserConnection;
  const oldCreate = CDPBrowser._create;
  CDPBrowser._create = (product, connection, ...args) => {
    browserConnection = connection;
    return oldCreate(product, connection, ...args);
  }

it would really be great if this could be done properly.

tripodsan avatar Nov 07 '22 10:11 tripodsan

hi @Junyan , it seems that newer versions of pupeteer now hide the browser._connection field, as it is private. do you know a workaroud?

I could hack it with:

const { CDPBrowser } = require('../node_modules/puppeteer/node_modules/puppeteer-core/lib/cjs/puppeteer/common/Browser.js');
.
.
.
  let browserConnection;
  const oldCreate = CDPBrowser._create;
  CDPBrowser._create = (product, connection, ...args) => {
    browserConnection = connection;
    return oldCreate(product, connection, ...args);
  }

it would really be great if this could be done properly.

@tripodsan It looks like a solution.. I don't have any new ideas because we have not upgraded puppeteer.

Junyan avatar Dec 03 '22 05:12 Junyan

The proposal in this issue should now be possible if anyone wants to try to implement it. in ChromeTargetManager it is possible to safely await until the target is is configured by the user code.

OrKoN avatar Dec 06 '22 08:12 OrKoN

Have puppeteer any possibility to catch which scripts call new tab open for catching popups?

useman avatar Dec 07 '22 08:12 useman

I have found a solution for intercepting popups using CDP in the latest version of puppeteer for my extremely specific use case, but hopefully this is helpful for others. Essentially, what you need to do is:

  1. Start a CDP session for the browser target
  2. Call Target.setAutoAttach with the filter { type: 'page', exclude: false }
  3. On Target.attachedToTarget, get the puppeteer Connection object from the CDP session for the browser target, and then get the CDP session of the popup target using the sessionId field from the attachedToTarget event
  4. Do whatever you need to do here to set up the tab, using the popup's CDP session
  5. Call Runtime.runIfWaitingForDebugger in the popup's CDP session to resume the tab

Here's some example code for my use case - I wanted to see what URL a popup is trying to go to, without letting it actually load the popup:

import puppeteer from 'puppeteer';

let browser = await puppeteer.launch({
  headless: false
});

const page = (await browser.pages())[0];

// Set this to a URL where you can click on something that opens a popup
await page.goto('https://www.example.com');

const cdp = await browser.target().createCDPSession();

// Set up auto-attaching, set the filter to pages only
await cdp.send('Target.setAutoAttach', {
  waitForDebuggerOnStart: true,
  autoAttach: true,
  flatten: true,
  filter: [
    { type: 'page', exclude: false },
  ]
})

cdp.on('Target.attachedToTarget', async ({sessionId, targetInfo, waitingForDebugger}) => {
  // Get the puppeteer Connection object from the existing CDP connection
  let connection = cdp.connection();
  if (!connection) {
    console.log('Could not get connection');
    return;
  }
  // Get the CDP session that has attached to the popup
  let popupCdp = connection.session(sessionId);
  if (!popupCdp) {
    console.log('Could not get popup CDP session');
    return;
  }
  
  // Do whatever setup you want to do in the popup here
  await popupCdp.send('Fetch.enable');
  popupCdp.on('Fetch.requestPaused', async ({requestId, request}) => {
    console.log('Intercepted popup - attempted to open URL:', request.url);
    popupCdp?.send('Fetch.failRequest', {requestId, errorReason: 'Aborted'});
  });
  
  // Allow the popup to continue executing
  await popupCdp.send('Runtime.runIfWaitingForDebugger');
});

console.log('ready');
// Click on something that opens a popup

eric-zeng avatar Jun 13 '23 06:06 eric-zeng

I'm playing with the CDP protocol a little bit in this regard and got a rough prototype working (intercept implicitly created targets, attach scripts or modify headers and continue loading the page).

My main issue is the "continue loading" bit: When emitting Runtime.runIfWaitingForDebugger the (in my case) Popup is still stuck in "Waiting.." and I need to send a Page.reload command to make the popup load after the modifications (when turning on waitForDebuggerOnStart: true).

The result is empty, so I'm not sure that's successful (I tried adding targetId param to no avail):

  puppeteer:protocol SEND ► {"sessionId":"B03359C06B0C7ABD2FFD363B70C8531A","method":"Runtime.runIfWaitingForDebugger","params":{},"id":29} +0ms
  puppeteer:protocol ◀ RECV {"id":29,"result":{},"sessionId":"B03359C06B0C7ABD2FFD363B70C8531A"} +1ms

I'm probably using it incorrectly but unfortunately I can't really find documentation or code snippets for this.

If anyone knows if I'm missing a crucial step here to make the page load please tell me :) I'll continue playing with this and see if I can make a plugin for puppeteer-extra out of this (and also to use it in the stealth plugin).

Here are the main bits (very much WIP):

const puppeteer = require("puppeteer-extra")
const { CDPSession } = require("puppeteer/lib/Connection")

puppeteer
  .launch({
    headless: false
  })
  .then(async browser => {
    const page = await browser.newPage()

    await page._client.send("Target.setAutoAttach", {
      autoAttach: true,
      flatten: true,
      windowOpen: true,
      waitForDebuggerOnStart: true // is set to false in pptr
    })

    page._client.on(
      "Target.attachedToTarget",
      async event => {

        if (event.waitingForDebugger) {
          console.log("continue")

          const newConn = await browser._connection.createSession(
            event.targetInfo
          )

          await newConn.send("Network.setExtraHTTPHeaders", {
            headers: { "x-foo": "hi-there" }
          })
          await newConn.send("Runtime.enable")
          await newConn.send("Page.setLifecycleEventsEnabled", {
            enabled: true
          })
          await newConn.send("Network.enable")
          await newConn.send("Page.enable")

          await newConn.send("Runtime.runIfWaitingForDebugger")
          console.log("all sent")
          await newConn.send("Page.reload") // Will not load without
        }
      }
    )

    await page.goto(
      "file:///temp/popup.html" // HTML page that will open a popup
    )
  })

Custom x-foo header is present in the popup created through window.open:

seems like Target.attachedToTarget events only get fired when Target.setDiscoverTargets is enabled.

However, I get <code=-32000 message=Not allowed> at Target.setDiscoverTargets, {"discover":true}

kaliiiiiiiiii avatar Jul 17 '23 12:07 kaliiiiiiiiii