puppeteer-extra
puppeteer-extra copied to clipboard
Doesn't play nice with Webpack
Edit by @berstend:
see here for the workaround: https://github.com/webpack/webpack/issues/4175#issuecomment-450746682 and another one specific to the stealth plugin: https://github.com/berstend/puppeteer-extra/issues/93#issuecomment-712364816
Original issue:
When bundling I get this error:
WARNING in ./node_modules/puppeteer-extra/dist/index.esm.js 294:22-35 Critical dependency: the request of a dependency is an expression @ ./src/processor.js @ ./src/index.js @ multi @babel/polyfill ./src/index.js
And then at run-time:
A plugin listed 'puppeteer-extra-plugin-stealth/evasions/chrome.runtime' as dependency, which is currently missing. Please install it:
yarn add puppeteer-extra-plugin-stealth Note: You don't need to require the plugin yourself, unless you want to modify it's default settings.
Error: Cannot find module 'puppeteer-extra-plugin-stealth/evasions/chrome.runtime'
Of course, puppeteer-extra-plugin-stealth
is already in the package.json
.
Work-around is to import and apply the plugins manually:
import puppeteerVanilla from 'puppeteer';
import { addExtra } from 'puppeteer-extra';
import StealthPlugin from 'puppeteer-extra-plugin-stealth';
import AcceptLanguagePlugin from 'puppeteer-extra-plugin-stealth/evasions/accept-language';
import ChromeRuntimePlugin from 'puppeteer-extra-plugin-stealth/evasions/chrome.runtime';
import ConsoleDebugPlugin from 'puppeteer-extra-plugin-stealth/evasions/console.debug';
import IFrameContentWindowPlugin from 'puppeteer-extra-plugin-stealth/evasions/iframe.contentWindow';
import MediaCodecsPlugin from 'puppeteer-extra-plugin-stealth/evasions/media.codecs';
import NavigatorLanguagesPlugin from 'puppeteer-extra-plugin-stealth/evasions/navigator.languages';
import NavigatorPermissionsPlugin from 'puppeteer-extra-plugin-stealth/evasions/navigator.permissions';
import NavigatorPlugins from 'puppeteer-extra-plugin-stealth/evasions/navigator.plugins';
import WebdriverPlugin from 'puppeteer-extra-plugin-stealth/evasions/navigator.webdriver';
import UserAgentPlugin from 'puppeteer-extra-plugin-stealth/evasions/user-agent';
import WebglVendorPlugin from 'puppeteer-extra-plugin-stealth/evasions/webgl.vendor';
import WindowOuterDimensionsPlugin from 'puppeteer-extra-plugin-stealth/evasions/window.outerdimensions';
async () => {
const puppeteer = addExtra(puppeteerVanilla);
const plugins = [
StealthPlugin(),
AcceptLanguagePlugin(),
ChromeRuntimePlugin(),
ConsoleDebugPlugin(),
IFrameContentWindowPlugin(),
MediaCodecsPlugin(),
NavigatorLanguagesPlugin(),
NavigatorPermissionsPlugin(),
NavigatorPlugins(),
WebdriverPlugin(),
UserAgentPlugin(),
WebglVendorPlugin(),
WindowOuterDimensionsPlugin(),
];
const browser = await puppeteer.launch();
for (const plugin of plugins) {
await plugin.onBrowser(browser);
}
const [ page ] = await browser.pages();
for (const plugin of plugins) {
await plugin.onPageCreated(page);
}
// ...
};
Hmm, interesting. May I ask why you opted to use Webpack for a NodeJS project?
If you could provide a minimal webpack based project with that issue that'd be great, as I then can take a quick look and see how to best fix this. :)
I use Webpack with Node because it's a simpler way to use Babel, bundle node_modules
and minify code.
webpack.config.js
:
const path = require('path');
const TerserPlugin = require('terser-webpack-plugin');
const { env } = process;
const isProduction = env['NODE_ENV'] === 'production';
const mode = isProduction ?
'production' :
'development';
console.log({ mode });
module.exports = {
entry: [
'@babel/polyfill',
'./src/index.js',
],
target: 'node',
devtool: isProduction ? false : 'source-map',
mode,
output: {
path: path.join(__dirname, 'build'),
filename: 'index.js'
},
module: {
rules: [
{
test: /\.m?js$/,
exclude: /(node_modules|bower_components)/,
use: {
loader: 'babel-loader',
options: {
babelrc: true,
},
},
},
{
test: /\.js$/,
loader: 'unlazy-loader'
}
],
},
optimization: {
minimize: isProduction,
minimizer: [
new TerserPlugin(),
],
},
resolve: {
alias: {
'pg-native': path.resolve(__dirname, 'aliases/pg-native'),
},
},
};
Thanks for providing the config, I'll look into it when I find time.
I use Webpack with Node because it's a simpler way to use Babel, bundle node_modules and minify code.
So it's to use new ES language features? Bundling and minifying shouldn't matter on backend code (hence my question). I personally used Babel with NodeJS back in the day to be able to use ES6 imports but eventually stopped doing that as I noticed issues with that and figured it's not worth it and causes more problems than it does good :)
Yes I use the latest ES6 features.
I also use loaders for other languages (such as Fable) and data (such as JSON, CSS).
Bundling everything can also be useful for targets like AWS lambda.
I think as rule-of-thumb, only using ES6 import
/ export
will always give a library that works with Webpack, Rollup, etc.
Got it :) Anyway, puppeteer-extra
should still be able to work with Webpack. It might be that the internal dependency system is just not aware of the bundler already taking care of the dependencies and a flag to disable the internal dependency resolution is sufficient.
Couple of avenues I'll explore to fix this:
- Add options to puppeteer-extra with something like
disableInternalDependencyResolution: true
- Add support for an ENV variable to detect the presence of bundlers
- Modify the
esm
rollup build to disable the internal dependency management (not sure about that, as regular NodeJS will soon(?) use esm exports as well?) - Make the dependency thing a warning rather than an error (would still spam log output during webpack build but that's ok I guess)
What is the purpose of the internal dependency management?
Could a simpler design work like this?
import puppeteerExtra from 'puppeteer-extra';
import MyPlugin from 'my-plugin';
const puppeteer = pupeteerExtra({
plugins: [
MyPlugin(),
],
});
async () => {
const browser = await puppeteer.launch();
// etc...
};
The above already exists (with puppeteer.use()
) :)
The idea behind the dependency plugin system was to make it easy to re-use plugins within plugins. E.g. the stealth
plugin needs to anonymize the UA and instead of copy pasting that code we just load the anonymize-ua
plugin internally as a dependency.
The code for that is here: https://github.com/berstend/puppeteer-extra/blob/master/packages/puppeteer-extra/src/index.ts#L326-L334
Thinking of it now (I wrote this in the very first version) I think we don't need to handle that within puppeteer-extra
, but can just use the native package.json
methods to declare them.
What I currently don't understand: This should still work with Webpack regardless, as they should transform the require()
statements to point at bundled resources instead. I need to take a closer look to see what's going on there.
So yeah, it's just webpack being bad at dynamic imports.
This fix should work: https://github.com/webpack/webpack/issues/4175#issuecomment-450746682
So basically adding this to the webpack module.rules
(untested):
{
// regex for the files that are problematic
test: \.\/node_modules\/puppeteer-extra\/dist\/index\.esm\.js,
loader: 'string-replace-loader',
options: {
// match a require function call where the argument isn't a string
// also capture the first character of the args so we can ignore it later
search: 'require[(]([^\'"])',
// replace the 'require(' with a '__non_webpack_require__(', meaning it will require the files at runtime
// $1 grabs the first capture group from the regex, the one character we matched and don't want to lose
replace: '__non_webpack_require__($1',
flags: 'g'
}
}
also you will need to install string-replace-loader: https://github.com/Va1/string-replace-loader
The above already exists (with puppeteer.use()) :)
I meant this usage but with internal dependency management removed entirely.
The idea behind the dependency plugin system was to make it easy to re-use plugins within plugins. E.g. the stealth plugin needs to anonymize the UA and instead of copy pasting that code we just load the anonymize-ua plugin internally as a dependency.
The code for that is here: https://github.com/berstend/puppeteer-extra/blob/master/packages/puppeteer-extra/src/index.ts#L326-L334
Thinking of it now (I wrote this in the very first version) I think we don't need to handle that within puppeteer-extra, but can just use the native package.json methods to declare them.
So the plugins would declare the other plugins that they depend on using package.json
dependencies? Sounds like the best solution to me.
What I currently don't understand: This should still work with Webpack regardless, as they should transform the require() statements to point at bundled resources instead. I need to take a closer look to see what's going on there.
string-replace-loader
looks like a good work-around. Thanks!
string-replace-loader looks like a good work-around. Thanks!
Would be great to hear if it fixes the webpack issue, then I can add it to the documentation :)
So the plugins would declare the other plugins that they depend on using package.json dependencies? Sounds like the best solution to me.
Unfortunately that won't fix the issue with dynamic imports + webpack, we still need to do dynamic require()
under the hood (which works fine, just not with webpack). :)
we still need to do dynamic require() under the hood
Why is this?
we still need to do dynamic require() under the hood
Why is this?
It's always good to question assumptions, I'm a big believer in that :)
Let's take the stealth
plugin as an example: it comes with a set of "evasions" (which are just regular plugins) and acts as an "umbrella" plugin, so the user doesn't need to add the specific evasions one-by-one.
One feature is that the user can add or remove evasions on that list, before puppeteer-extra
will require these files (and thereby code mods).
I'm not aware of a way to accomplish that without dynamic require()
. :)
Assuming the fix mentioned in https://github.com/berstend/puppeteer-extra/issues/93#issuecomment-564185702 works I'm gonna demote this issue to "nice to have in a future version". :)
we still need to do dynamic require() under the hood
Why is this?
It's always good to question assumptions, I'm a big believer in that :)
Let's take the
stealth
plugin as an example: it comes with a set of "evasions" (which are just regular plugins) and acts as an "umbrella" plugin, so the user doesn't need to add the specific evasions one-by-one.One feature is that the user can add or remove evasions on that list, before
puppeteer-extra
will require these files (and thereby code mods).I'm not aware of a way to accomplish that without dynamic
require()
. :)
The stealth plugin could require all evasions statically. This would cover most use-cases.
Users who need something more custom could manually import the ones that they need. This wouldn't be much code and the full import list could be copied from the stealth plugin then edited.
I got this working locally but when I used webpack to bundle it and send it over to aws lambda, this line
StealthPlugin();
results to the error below. Adding "kind-of": "^6.0.2" to the project's package.json does not resolve the problem.
Error: Cannot find module 'kind-of' at a (/var/task/index.js:145:1835) at Function.o [as typeOf] (/var/task/index.js:145:1440) at i (/var/task/index.js:145:854) at e.exports (/var/task/index.js:145:371) at new n (/var/task/index.js:139:191) at new i (/var/task/index.js:133:12241) at e.exports (/var/task/index.js:133:12910) at Object.startBrowser (/var/task/index.js:127:99655) at Runtime.t.handler (/var/task/index.js:127:86330) at processTicksAndRejections (internal/process/task_queues.js:93:5) { code: 'MODULE_NOT_FOUND' }
I got this working locally but when I used webpack to bundle it and send it over to aws lambda, this line
StealthPlugin();
results to the error below. Adding "kind-of": "^6.0.2" to the project's package.json does not resolve the problem.
Error: Cannot find module 'kind-of' at a (/var/task/index.js:145:1835) at Function.o [as typeOf] (/var/task/index.js:145:1440) at i (/var/task/index.js:145:854) at e.exports (/var/task/index.js:145:371) at new n (/var/task/index.js:139:191) at new i (/var/task/index.js:133:12241) at e.exports (/var/task/index.js:133:12910) at Object.startBrowser (/var/task/index.js:127:99655) at Runtime.t.handler (/var/task/index.js:127:86330) at processTicksAndRejections (internal/process/task_queues.js:93:5) { code: 'MODULE_NOT_FOUND' }
I had the same issue. unlazy-loader solved this for me...
@levz0r Interesting. Would you mind providing a full example using unlazy-loader
for others? :)
@levz0r Interesting. Would you mind providing a full example using
unlazy-loader
for others? :)
Hey, sorry for the long delay...
Nothing special actually... Just add
rules: [
{
test: /\.js$/,
use: "unlazy-loader"
}
]
To webpack.config.js. That's it.
Hope it helps.
I'm getting a similar problem to this with serverless-bundle
. Any ideas how to work around this with serverless-bundle
?
Hello,
Any update on this issue ? I tried all the workaround combinations here (painfully). One did work but seems very hacky :
Import files like described here : https://github.com/berstend/puppeteer-extra/issues/93#issuecomment-563159177
But i had to comment out AcceptLanguagePlugin
and UserAgentPlugin
(maybe they don't exist anymore ?)
// import AcceptLanguagePlugin from 'puppeteer-extra-plugin-stealth/evasions/accept-language';
// import UserAgentPlugin from 'puppeteer-extra-plugin-stealth/evasions/user-agent';
But I also had to use the solution described here : https://github.com/berstend/puppeteer-extra/issues/93#issuecomment-575600329
When building I'm getting these warnings :
warning in ./node_modules/puppeteer-extra/dist/index.esm.js
Critical dependency: the request of a dependency is an expression
warning in ./node_modules/puppeteer-extra/dist/index.esm.js
Module not found: Error: Can't resolve 'puppeteer-core' in '...\node_modules\puppeteer-extra\dist'
But puppeteer opens and seems to work.
Unfortunately this more elegant solution didn't work : https://github.com/berstend/puppeteer-extra/issues/93#issuecomment-564185702
Would be nice to clarify the best way to make stealth-plugin work with webpack, and maybe document it.
Thanks
Same issue over here.
@celicoo did you try this? https://github.com/berstend/puppeteer-extra/issues/93#issuecomment-564185702
@celicoo for me I had to add both solutions mentioned here. In second rule I had to update regex in test
:
{
test: /\.js$/,
use: "unlazy-loader"
},
{
// regex for the files that are problematic
test: /node_modules\/puppeteer-extra\/dist\/index\.esm\.js/,
loader: 'string-replace-loader',
options: {
// match a require function call where the argument isn't a string
// also capture the first character of the args so we can ignore it later
search: 'require[(]([^\'"])',
// replace the 'require(' with a '__non_webpack_require__(', meaning it will require the files at runtime
// $1 grabs the first capture group from the regex, the one character we matched and don't want to lose
replace: '__non_webpack_require__($1',
flags: 'g'
}
},
I got this working locally but when I used webpack to bundle it and send it over to aws lambda, this line
StealthPlugin();
results to the error below. Adding "kind-of": "^6.0.2" to the project's package.json does not resolve the problem. Error: Cannot find module 'kind-of' at a (/var/task/index.js:145:1835) at Function.o [as typeOf] (/var/task/index.js:145:1440) at i (/var/task/index.js:145:854) at e.exports (/var/task/index.js:145:371) at new n (/var/task/index.js:139:191) at new i (/var/task/index.js:133:12241) at e.exports (/var/task/index.js:133:12910) at Object.startBrowser (/var/task/index.js:127:99655) at Runtime.t.handler (/var/task/index.js:127:86330) at processTicksAndRejections (internal/process/task_queues.js:93:5) { code: 'MODULE_NOT_FOUND' }I had the same issue. unlazy-loader solved this for me...
Doesn't work for me.
Any fix for this?
On a more general note:
- please post your configs/what you already tried to do
- please describe what didn't work (with exact error messages)
- please ask nicely for help/input from others in case you experience issues