puppeteer-extra icon indicating copy to clipboard operation
puppeteer-extra copied to clipboard

Doesn't play nice with Webpack

Open njlr opened this issue 5 years ago • 60 comments

Edit by @berstend:

see here for the workaround: https://github.com/webpack/webpack/issues/4175#issuecomment-450746682 and another one specific to the stealth plugin: https://github.com/berstend/puppeteer-extra/issues/93#issuecomment-712364816

Original issue:

When bundling I get this error:

WARNING in ./node_modules/puppeteer-extra/dist/index.esm.js 294:22-35 Critical dependency: the request of a dependency is an expression @ ./src/processor.js @ ./src/index.js @ multi @babel/polyfill ./src/index.js

And then at run-time:

A plugin listed 'puppeteer-extra-plugin-stealth/evasions/chrome.runtime' as dependency, which is currently missing. Please install it:

      yarn add puppeteer-extra-plugin-stealth

      Note: You don't need to require the plugin yourself,
      unless you want to modify it's default settings.
      

Error: Cannot find module 'puppeteer-extra-plugin-stealth/evasions/chrome.runtime'

Of course, puppeteer-extra-plugin-stealth is already in the package.json.

njlr avatar Dec 06 '19 18:12 njlr

Work-around is to import and apply the plugins manually:

import puppeteerVanilla from 'puppeteer';
import { addExtra } from 'puppeteer-extra';
import StealthPlugin from 'puppeteer-extra-plugin-stealth';
import AcceptLanguagePlugin from 'puppeteer-extra-plugin-stealth/evasions/accept-language';
import ChromeRuntimePlugin from 'puppeteer-extra-plugin-stealth/evasions/chrome.runtime';
import ConsoleDebugPlugin from 'puppeteer-extra-plugin-stealth/evasions/console.debug';
import IFrameContentWindowPlugin from 'puppeteer-extra-plugin-stealth/evasions/iframe.contentWindow';
import MediaCodecsPlugin from 'puppeteer-extra-plugin-stealth/evasions/media.codecs';
import NavigatorLanguagesPlugin from 'puppeteer-extra-plugin-stealth/evasions/navigator.languages';
import NavigatorPermissionsPlugin from 'puppeteer-extra-plugin-stealth/evasions/navigator.permissions';
import NavigatorPlugins from 'puppeteer-extra-plugin-stealth/evasions/navigator.plugins';
import WebdriverPlugin from 'puppeteer-extra-plugin-stealth/evasions/navigator.webdriver';
import UserAgentPlugin from 'puppeteer-extra-plugin-stealth/evasions/user-agent';
import WebglVendorPlugin from 'puppeteer-extra-plugin-stealth/evasions/webgl.vendor';
import WindowOuterDimensionsPlugin from 'puppeteer-extra-plugin-stealth/evasions/window.outerdimensions';

async () => {
  const puppeteer = addExtra(puppeteerVanilla);

  const plugins = [
    StealthPlugin(),
    AcceptLanguagePlugin(),
    ChromeRuntimePlugin(),
    ConsoleDebugPlugin(),
    IFrameContentWindowPlugin(),
    MediaCodecsPlugin(),
    NavigatorLanguagesPlugin(),
    NavigatorPermissionsPlugin(),
    NavigatorPlugins(),
    WebdriverPlugin(),
    UserAgentPlugin(),
    WebglVendorPlugin(),
    WindowOuterDimensionsPlugin(),
  ];

  const browser = await puppeteer.launch();

  for (const plugin of plugins) {
    await plugin.onBrowser(browser);
  }

  const [ page ] = await browser.pages();

  for (const plugin of plugins) {
    await plugin.onPageCreated(page);
  }

  // ...
};

njlr avatar Dec 09 '19 10:12 njlr

Hmm, interesting. May I ask why you opted to use Webpack for a NodeJS project?

If you could provide a minimal webpack based project with that issue that'd be great, as I then can take a quick look and see how to best fix this. :)

berstend avatar Dec 10 '19 14:12 berstend

I use Webpack with Node because it's a simpler way to use Babel, bundle node_modules and minify code.

webpack.config.js:

const path = require('path');
const TerserPlugin = require('terser-webpack-plugin');

const { env } = process;

const isProduction = env['NODE_ENV'] === 'production';

const mode = isProduction ?
  'production' :
  'development';

console.log({ mode });

module.exports = {
  entry: [
    '@babel/polyfill',
    './src/index.js',
  ],
  target: 'node',
  devtool: isProduction ? false : 'source-map',
  mode,
  output: {
    path: path.join(__dirname, 'build'),
    filename: 'index.js'
  },
  module: {
    rules: [
      {
        test: /\.m?js$/,
        exclude: /(node_modules|bower_components)/,
        use: {
          loader: 'babel-loader',
          options: {
            babelrc: true,
          },
        },
      },
      {
        test: /\.js$/,
        loader: 'unlazy-loader'
      }
    ],
  },
  optimization: {
    minimize: isProduction,
    minimizer: [
       new TerserPlugin(),
    ],
  },
  resolve: {
    alias: {
      'pg-native': path.resolve(__dirname, 'aliases/pg-native'),
    },
  },
};

njlr avatar Dec 10 '19 15:12 njlr

Thanks for providing the config, I'll look into it when I find time.

I use Webpack with Node because it's a simpler way to use Babel, bundle node_modules and minify code.

So it's to use new ES language features? Bundling and minifying shouldn't matter on backend code (hence my question). I personally used Babel with NodeJS back in the day to be able to use ES6 imports but eventually stopped doing that as I noticed issues with that and figured it's not worth it and causes more problems than it does good :)

berstend avatar Dec 10 '19 15:12 berstend

Yes I use the latest ES6 features.

I also use loaders for other languages (such as Fable) and data (such as JSON, CSS).

Bundling everything can also be useful for targets like AWS lambda.

njlr avatar Dec 10 '19 15:12 njlr

I think as rule-of-thumb, only using ES6 import / export will always give a library that works with Webpack, Rollup, etc.

njlr avatar Dec 10 '19 15:12 njlr

Got it :) Anyway, puppeteer-extra should still be able to work with Webpack. It might be that the internal dependency system is just not aware of the bundler already taking care of the dependencies and a flag to disable the internal dependency resolution is sufficient.

berstend avatar Dec 10 '19 15:12 berstend

Couple of avenues I'll explore to fix this:

  • Add options to puppeteer-extra with something like disableInternalDependencyResolution: true
  • Add support for an ENV variable to detect the presence of bundlers
  • Modify the esm rollup build to disable the internal dependency management (not sure about that, as regular NodeJS will soon(?) use esm exports as well?)
  • Make the dependency thing a warning rather than an error (would still spam log output during webpack build but that's ok I guess)

berstend avatar Dec 10 '19 15:12 berstend

What is the purpose of the internal dependency management?

Could a simpler design work like this?

import puppeteerExtra from 'puppeteer-extra';
import MyPlugin from 'my-plugin';

const puppeteer = pupeteerExtra({
  plugins: [
    MyPlugin(),
  ],
});

async () => {
  const browser = await puppeteer.launch();

  // etc...
};

njlr avatar Dec 10 '19 15:12 njlr

The above already exists (with puppeteer.use()) :)

The idea behind the dependency plugin system was to make it easy to re-use plugins within plugins. E.g. the stealth plugin needs to anonymize the UA and instead of copy pasting that code we just load the anonymize-ua plugin internally as a dependency.

The code for that is here: https://github.com/berstend/puppeteer-extra/blob/master/packages/puppeteer-extra/src/index.ts#L326-L334

Thinking of it now (I wrote this in the very first version) I think we don't need to handle that within puppeteer-extra, but can just use the native package.json methods to declare them.

What I currently don't understand: This should still work with Webpack regardless, as they should transform the require() statements to point at bundled resources instead. I need to take a closer look to see what's going on there.

berstend avatar Dec 10 '19 17:12 berstend

So yeah, it's just webpack being bad at dynamic imports.

This fix should work: https://github.com/webpack/webpack/issues/4175#issuecomment-450746682

berstend avatar Dec 10 '19 19:12 berstend

So basically adding this to the webpack module.rules (untested):

{
  // regex for the files that are problematic
  test: \.\/node_modules\/puppeteer-extra\/dist\/index\.esm\.js,
  loader: 'string-replace-loader',
  options: {
    // match a require function call where the argument isn't a string
    // also capture the first character of the args so we can ignore it later
    search: 'require[(]([^\'"])',
    // replace the 'require(' with a '__non_webpack_require__(', meaning it will require the files at runtime
    // $1 grabs the first capture group from the regex, the one character we matched and don't want to lose
    replace: '__non_webpack_require__($1',
    flags: 'g'
  }
}

also you will need to install string-replace-loader: https://github.com/Va1/string-replace-loader

berstend avatar Dec 10 '19 19:12 berstend

The above already exists (with puppeteer.use()) :)

I meant this usage but with internal dependency management removed entirely.

The idea behind the dependency plugin system was to make it easy to re-use plugins within plugins. E.g. the stealth plugin needs to anonymize the UA and instead of copy pasting that code we just load the anonymize-ua plugin internally as a dependency.

The code for that is here: https://github.com/berstend/puppeteer-extra/blob/master/packages/puppeteer-extra/src/index.ts#L326-L334

Thinking of it now (I wrote this in the very first version) I think we don't need to handle that within puppeteer-extra, but can just use the native package.json methods to declare them.

So the plugins would declare the other plugins that they depend on using package.json dependencies? Sounds like the best solution to me.

What I currently don't understand: This should still work with Webpack regardless, as they should transform the require() statements to point at bundled resources instead. I need to take a closer look to see what's going on there.

string-replace-loader looks like a good work-around. Thanks!

njlr avatar Dec 11 '19 10:12 njlr

string-replace-loader looks like a good work-around. Thanks!

Would be great to hear if it fixes the webpack issue, then I can add it to the documentation :)

So the plugins would declare the other plugins that they depend on using package.json dependencies? Sounds like the best solution to me.

Unfortunately that won't fix the issue with dynamic imports + webpack, we still need to do dynamic require() under the hood (which works fine, just not with webpack). :)

berstend avatar Dec 11 '19 15:12 berstend

we still need to do dynamic require() under the hood

Why is this?

njlr avatar Dec 11 '19 22:12 njlr

we still need to do dynamic require() under the hood

Why is this?

It's always good to question assumptions, I'm a big believer in that :)

Let's take the stealth plugin as an example: it comes with a set of "evasions" (which are just regular plugins) and acts as an "umbrella" plugin, so the user doesn't need to add the specific evasions one-by-one.

One feature is that the user can add or remove evasions on that list, before puppeteer-extra will require these files (and thereby code mods).

I'm not aware of a way to accomplish that without dynamic require(). :)

berstend avatar Dec 14 '19 17:12 berstend

Assuming the fix mentioned in https://github.com/berstend/puppeteer-extra/issues/93#issuecomment-564185702 works I'm gonna demote this issue to "nice to have in a future version". :)

berstend avatar Dec 14 '19 17:12 berstend

we still need to do dynamic require() under the hood

Why is this?

It's always good to question assumptions, I'm a big believer in that :)

Let's take the stealth plugin as an example: it comes with a set of "evasions" (which are just regular plugins) and acts as an "umbrella" plugin, so the user doesn't need to add the specific evasions one-by-one.

One feature is that the user can add or remove evasions on that list, before puppeteer-extra will require these files (and thereby code mods).

I'm not aware of a way to accomplish that without dynamic require(). :)

The stealth plugin could require all evasions statically. This would cover most use-cases.

Users who need something more custom could manually import the ones that they need. This wouldn't be much code and the full import list could be copied from the stealth plugin then edited.

njlr avatar Dec 15 '19 19:12 njlr

I got this working locally but when I used webpack to bundle it and send it over to aws lambda, this line

StealthPlugin();

results to the error below. Adding "kind-of": "^6.0.2" to the project's package.json does not resolve the problem.

Error: Cannot find module 'kind-of' at a (/var/task/index.js:145:1835) at Function.o [as typeOf] (/var/task/index.js:145:1440) at i (/var/task/index.js:145:854) at e.exports (/var/task/index.js:145:371) at new n (/var/task/index.js:139:191) at new i (/var/task/index.js:133:12241) at e.exports (/var/task/index.js:133:12910) at Object.startBrowser (/var/task/index.js:127:99655) at Runtime.t.handler (/var/task/index.js:127:86330) at processTicksAndRejections (internal/process/task_queues.js:93:5) { code: 'MODULE_NOT_FOUND' }

ioannist avatar Dec 21 '19 21:12 ioannist

I got this working locally but when I used webpack to bundle it and send it over to aws lambda, this line

StealthPlugin();

results to the error below. Adding "kind-of": "^6.0.2" to the project's package.json does not resolve the problem.

Error: Cannot find module 'kind-of' at a (/var/task/index.js:145:1835) at Function.o [as typeOf] (/var/task/index.js:145:1440) at i (/var/task/index.js:145:854) at e.exports (/var/task/index.js:145:371) at new n (/var/task/index.js:139:191) at new i (/var/task/index.js:133:12241) at e.exports (/var/task/index.js:133:12910) at Object.startBrowser (/var/task/index.js:127:99655) at Runtime.t.handler (/var/task/index.js:127:86330) at processTicksAndRejections (internal/process/task_queues.js:93:5) { code: 'MODULE_NOT_FOUND' }

I had the same issue. unlazy-loader solved this for me...

levz0r avatar Dec 21 '19 21:12 levz0r

@levz0r Interesting. Would you mind providing a full example using unlazy-loader for others? :)

berstend avatar Jan 06 '20 16:01 berstend

@levz0r Interesting. Would you mind providing a full example using unlazy-loader for others? :)

Hey, sorry for the long delay...

Nothing special actually... Just add

rules: [
      {
        test: /\.js$/,
        use: "unlazy-loader"
      }
    ]

To webpack.config.js. That's it.

Hope it helps.

levz0r avatar Jan 17 '20 12:01 levz0r

I'm getting a similar problem to this with serverless-bundle. Any ideas how to work around this with serverless-bundle?

SaajidJoosab avatar May 01 '20 00:05 SaajidJoosab

Hello,

Any update on this issue ? I tried all the workaround combinations here (painfully). One did work but seems very hacky :

Import files like described here : https://github.com/berstend/puppeteer-extra/issues/93#issuecomment-563159177

But i had to comment out AcceptLanguagePlugin and UserAgentPlugin (maybe they don't exist anymore ?)

// import AcceptLanguagePlugin from 'puppeteer-extra-plugin-stealth/evasions/accept-language';
// import UserAgentPlugin from 'puppeteer-extra-plugin-stealth/evasions/user-agent';

But I also had to use the solution described here : https://github.com/berstend/puppeteer-extra/issues/93#issuecomment-575600329

When building I'm getting these warnings :

 warning  in ./node_modules/puppeteer-extra/dist/index.esm.js
Critical dependency: the request of a dependency is an expression
 warning  in ./node_modules/puppeteer-extra/dist/index.esm.js
Module not found: Error: Can't resolve 'puppeteer-core' in '...\node_modules\puppeteer-extra\dist'

But puppeteer opens and seems to work.


Unfortunately this more elegant solution didn't work : https://github.com/berstend/puppeteer-extra/issues/93#issuecomment-564185702

Would be nice to clarify the best way to make stealth-plugin work with webpack, and maybe document it.

Thanks

sydney-d avatar May 23 '20 18:05 sydney-d

Same issue over here.

celicoo avatar Jun 04 '20 13:06 celicoo

@celicoo did you try this? https://github.com/berstend/puppeteer-extra/issues/93#issuecomment-564185702

berstend avatar Aug 06 '20 04:08 berstend

@celicoo for me I had to add both solutions mentioned here. In second rule I had to update regex in test:

            {
                test: /\.js$/,
                use: "unlazy-loader"
            },
            {
                // regex for the files that are problematic
                test: /node_modules\/puppeteer-extra\/dist\/index\.esm\.js/,
                loader: 'string-replace-loader',
                options: {
                    // match a require function call where the argument isn't a string
                    // also capture the first character of the args so we can ignore it later
                    search: 'require[(]([^\'"])',
                    // replace the 'require(' with a '__non_webpack_require__(', meaning it will require the files at runtime
                    // $1 grabs the first capture group from the regex, the one character we matched and don't want to lose
                    replace: '__non_webpack_require__($1',
                    flags: 'g'
                }
            },

psalkowski avatar Aug 14 '20 20:08 psalkowski

I got this working locally but when I used webpack to bundle it and send it over to aws lambda, this line StealthPlugin(); results to the error below. Adding "kind-of": "^6.0.2" to the project's package.json does not resolve the problem. Error: Cannot find module 'kind-of' at a (/var/task/index.js:145:1835) at Function.o [as typeOf] (/var/task/index.js:145:1440) at i (/var/task/index.js:145:854) at e.exports (/var/task/index.js:145:371) at new n (/var/task/index.js:139:191) at new i (/var/task/index.js:133:12241) at e.exports (/var/task/index.js:133:12910) at Object.startBrowser (/var/task/index.js:127:99655) at Runtime.t.handler (/var/task/index.js:127:86330) at processTicksAndRejections (internal/process/task_queues.js:93:5) { code: 'MODULE_NOT_FOUND' }

I had the same issue. unlazy-loader solved this for me...

Doesn't work for me.

Nisthar avatar Oct 08 '20 11:10 Nisthar

Any fix for this?

Nisthar avatar Oct 09 '20 18:10 Nisthar

On a more general note:

  • please post your configs/what you already tried to do
  • please describe what didn't work (with exact error messages)
  • please ask nicely for help/input from others in case you experience issues

berstend avatar Oct 10 '20 09:10 berstend