puppeteer-extra icon indicating copy to clipboard operation
puppeteer-extra copied to clipboard

Run all Puppeteer commands in an Isolated World

Open prescience-data opened this issue 4 years ago • 13 comments

Attempting to address this issue below and looking for feedback and ideas on any potential ways to achieve this more cleanly: https://github.com/berstend/puppeteer-extra/issues/209

The goal is to have Puppeteer run every command in a Isolated Worlds to avoid detection scripts being able to monitor execution.

The concept write up is here (sorry, needed too much detail to include within the issue text): https://github.com/prescience-data/harden-puppeteer

The only way I can figure out how to achieve this is by modifying the vanilla Puppeteer files directly in the node_modules folder, so hoping someone with experience writing Puppteer-Extra plugins can advise a way to do with with a plugin instead.

Thanks!

prescience-data avatar Jun 11 '20 06:06 prescience-data

Have you looked/tried https://github.com/ds300/patch-package ?

brunogaspar avatar Jun 11 '20 08:06 brunogaspar

@brunogaspar thanks! Just re-did the concept as a patch, much much easier to follow compared to the previous way!

prescience-data avatar Jun 12 '20 00:06 prescience-data

My main concern is whether this will somehow mess up plugins like Extra-Stealth by accidently running all of their modifications within the isolated world (ie effectively disabling them)

prescience-data avatar Jun 12 '20 03:06 prescience-data

Just a matter of trying, have you tried with the latest Puppeteer?

brunogaspar avatar Jun 15 '20 22:06 brunogaspar

Yes, it does work (on 1.19.0, will be updating it 2.1.0) - I'm just trying to think of unknown unknowns etc. Do you know of any ways to test Extra-Stealth features to make sure they are all active?

prescience-data avatar Jun 15 '20 23:06 prescience-data

hmm you mean, to determine if the stuff that the stealth plugin does is still being applied? If that's it, i suppose you can try to mimic what the unit tests for the stealth plugin does.

If that's not it, please elaborate a bit more and i'll try to help you out.

brunogaspar avatar Jun 16 '20 07:06 brunogaspar

My main concern is whether this will somehow mess up plugins like Extra-Stealth by accidently running all of their modifications within the isolated world (ie effectively disabling them)

This should become apparent immediately when using your patches and running yarn test in the stealth plugin repo. :-)

Haven't looked more closely at isolated worlds so far but is it similar to what happens in Chrome Extensions and Content Scripts? If so then this would have an effect, as the Puppeteer scripts couldn't access the site's local window object (only DOM) without injecting another script in the site.

berstend avatar Jul 10 '20 04:07 berstend

Haven't looked more closely at isolated worlds so far but is it similar to what happens in Chrome Extensions and Content Scripts? If so then this would have an effect, as the Puppeteer scripts couldn't access the site's local window object (only DOM) without injecting another script in the site.

Yes it's the same as how the Content Scripts work from memory.

What I've tried to do is to isolate only the commands sent by the user, meaning the rest of Puppeteer should run normally, but any detection scripts will be unable to monitor your commands, other than to see the outcome in the DOM.

The trade off is that any global libraries you might be expecting to have access to, you'll need to include directly in the script rather than look for them on window._____, and naturally that means if you need to interact with the site's custom scripts directly you might not be able to do this (have not tested this though).

prescience-data avatar Jul 11 '20 03:07 prescience-data

Ok so running the tests in puppeteer-extra-plugin-stealth dumps a bunch of these errors with the patch applied:

 Rejected promise returned by test. Reason:

  Error {
    message: `Evaluation failed: ReferenceError: fpCollect is not definedΓÉè
        at jquery.js:1:18`,
  }

Which would be expected if fpCollect is defined outside the isolated world, but the test seems to be testing from "inside" Puppeteer, whereas I think a more accurate test would be inspecting it from "outside", as a detection script would?

prescience-data avatar Jul 11 '20 03:07 prescience-data

But don't we intentionally want to run in the same context as the site's JS in order to be able to access and modify it?

Let's make a simpler test case to help understand this:

await page.evaluateOnNewDocument(() => {
  delete Object.getPrototypeOf(navigator).webdriver
})

edit, and then navigating to https://bot.sannysoft.com/ and see if Webdriver is missing

Would this work with your patched files? If not (similar to Content Script isolation in Chrome) we'd need to inject another JS script into the site/DOM with the actual payload, which is trivial to detect (MutationObservers, Content Security Policies).

berstend avatar Jul 11 '20 08:07 berstend

I don't believe so because this is the page.evaluateOnNewDocument function:

  /**
   * @param {Function|string} pageFunction
   * @param {!Array<*>} args
   */
  async evaluateOnNewDocument(pageFunction, ...args) {
    const source = helper.evaluationString(pageFunction, ...args);
    await this._client.send('Page.addScriptToEvaluateOnNewDocument', { source });
  }

You can see it is sending the command directly to the _client rather than passing through FrameManager (which is where the isolated world exists).

The isolated world is set up to catch things like page.evaluate() eg:

  /**
   * @param {Function|string} pageFunction
   * @param {!Array<*>} args
   * @return {!Promise<*>}
   */
  async evaluate(pageFunction, ...args) {
    return this._frameManager.mainFrame().evaluate(pageFunction, ...args);
  }

Which has been overridden here https://github.com/prescience-data/harden-puppeteer/blob/ba202cc0a422b257c26f023fbaafd41f7ae48157/patches/puppeteer%2B1.19.0.patch#L86 to:

return this._frameManager.isolatedWorld().evaluate(pageFunction, ...args);

The _mainFrame() is the dangerous one where the detection scripts exist. Running the interaction commands like evaluate(), type(), etc inside the isolated world means the detection scripts cannot monitor them.

prescience-data avatar Jul 11 '20 08:07 prescience-data

edit, and then navigating to https://bot.sannysoft.com/ and see if Webdriver is missing

Passed: Pass

Is this what you were expecting for webdriver?

(edit: FYI that is on the 1.19 patch, 2.1.0 is not working properly) (edit 2: Just updated the 2.x patch to 2.1.1 and is now working)

Also passes:

FingerprintJS

https://fingerprintjs.com/demo

image

Are You Headless?

https://arh.antoinevastel.com/bots/areyouheadless

image

SocialNetsDefender

http://anonymity.space/hellobot.php

image

Distil Networks

http://promos.rtm.com

image

prescience-data avatar Jul 12 '20 08:07 prescience-data

Ok so I can confirm that the patch works as intended.

I've made a test for it here that uses Vastel's execution monitoring technique to figure out if the host site has any visibility into the patched context:

Puppeteer Test: https://github.com/prescience-data/puppeteer-botcheck/blob/b6848845b8b5887608784caa2fe7a078db866e9b/Botcheck.js#L45 Host Monitoring Execution: https://github.com/prescience-data/prescience-data.github.io/blob/master/execution-monitor.html

URL of the live test: https://prescience-data.github.io/execution-monitor.html

Here's the differences between unpatched and patched:

Unpatched

Unpatched

Patched

Unpatched

You can see that the patched version only detects the inserted elements (which was left deliberately unisolated to allow user to inject scripts into the main context (ie all the extra-stealth modifications).

However, anything other than that is running isolated and outside the security scope of any bot detection script.

Naturally they would be able to observe changes you make to the DOM, but only the outcome, not how the execution is occurring.

prescience-data avatar Jul 25 '20 03:07 prescience-data