serverless-chrome icon indicating copy to clipboard operation
serverless-chrome copied to clipboard

Example of how to use Puppeteer

Open adieuadieu opened this issue 7 years ago • 11 comments

An example service demonstrating how to use Puppeteer

adieuadieu avatar Nov 13 '17 10:11 adieuadieu

try this one https://codeburst.io/a-guide-to-automating-scraping-the-web-with-javascript-chrome-puppeteer-node-js-b18efb9e9921

jonbonraki avatar Nov 13 '17 12:11 jonbonraki

https://github.com/balmbees/suspicious-serverless/blob/master/src/services/content_dispatcher.ts

Here is our service code which uses serverless-chrome with puppeteer :)

mooyoul avatar Dec 10 '17 15:12 mooyoul

Thank you very much @mooyoul, your code works great! I managed to copy it into my Puppeteer-powered project, and now I can deploy it on Lambda!

The only thing is that I have to remember everytime to set the env variable to avoid that Puppeteer downloads Chrome during the Serverless packaging process. I don't know why, but while "packing external modules", Puppeteer retries the download everytime, making the final package too big to be deployed on Lambda. Maybe it's because of serverless-webpack re-npm-installing everything? So, I'm using:

PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=1 serverless deploy

Do you have any insight about this, or fixed the issue differently?

lorenzos avatar Jan 19 '18 11:01 lorenzos

@lorenzos

I'm not sure about your specific case but creating .npmrc file in project root directory with line

puppeteer_skip_chromium_download=1

works for me.

jakub300 avatar Jan 19 '18 11:01 jakub300

@jakub300 I tried both puppeteer_skip_chromium_download=1 and PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=1 in both .npmrc and .serverlessrc, but slss commands still trigger Puppeteer downloading Chrome :man_shrugging:

lorenzos avatar Jan 19 '18 14:01 lorenzos

Hi @lorenzos, Glad to hear that my code was helpful :)

As you know, serverless-webpack plugin installs external dependency.

In my case, i specified @serverless-chrome/lambda as external module (and here), because @serverless-chrome/lambda package has additional chrome binary. so with external definition, built-in chrome binary will be packaged with bundled (from webpack build) script. (and other dependencies will be packaged into single bundle file also. without npm install attempts)

Anyway, i haven't experienced your issue (serverless-webpack tries to install puppeteer package during bundling) with my configuration.

Could you refer my setup and retry again?

mooyoul avatar Jan 19 '18 14:01 mooyoul

@mooyoul My serverless.yml is identical, but my webpack.config.js is quite different: I have externals: [require("webpack-node-externals")()] instead, as shown here. I'm not practical of Webpack, so I'm not sure of what it does or if it's equivalent.

Anyway, during Serverless packaging, built-in Chrome is downloaded and bundled by serverless-chrome. The problem is that the same is done by Puppeteer too...

$ slss package ### More than 100MB, will fail to be uploaded
$ PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=1 slss package ### About ~40 MB, Puppeteer will work thanks to your code

So, I don't know where I should change my setup.

lorenzos avatar Jan 19 '18 15:01 lorenzos

@lorenzos One more thing, for npm/yarn to load things from .npmrc you need to run command via them. Could you try to define script in package.json similar to example from repository?

https://github.com/adieuadieu/serverless-chrome/blob/daa22a3978cb2be79a73dd283d278ef56c3053c0/examples/serverless-framework/aws/package.json#L21

jakub300 avatar Jan 19 '18 15:01 jakub300

@jakub300 Uh, that's right. But now that I think about it, I can just set the env directly in that deploy script. Thank you!

lorenzos avatar Jan 19 '18 15:01 lorenzos

In most cases, I highly recommend not using webpack-node-externals with webpack. By default, webpack-node-externals returns all of dependencies of your project (see description). so that's why everytime you are building your project via webpack, serverless-webpack tries to install entire dependencies.

mooyoul avatar Jan 19 '18 16:01 mooyoul

@mooyoul Thank, Puppeteer doesn't download Chrome when packaging if, in Webpack config, I use only:

externals: [
    'aws-sdk', 
    'es6-promise',
    '@serverless-chrome/lambda',
    'ws'
]

The reason for the additional two modules there is to avoid some warnings during building (i.e. when invoking local, packaging or deploying). The deployed function worked correctly anyway.

WARNING in ./node_modules/ws/lib/BufferUtil.js
Module not found: Error: Can't resolve 'bufferutil' in '/home/lorenzo/Dev/yamm-importer/node_modules/ws/lib'
 @ ./node_modules/ws/lib/BufferUtil.js 35:21-42
 @ ./node_modules/ws/lib/Receiver.js
 @ ./node_modules/ws/index.js
 @ ./node_modules/puppeteer/node6/Connection.js
 @ ./node_modules/puppeteer/node6/Launcher.js
 @ ./node_modules/puppeteer/node6/Puppeteer.js
 @ ./lib/utils/puppeteer-launcher.js
 @ ./lib/test.js
 @ ./handler.js

WARNING in ./node_modules/ws/lib/Validation.js
Module not found: Error: Can't resolve 'utf-8-validate' in '/home/lorenzo/Dev/yamm-importer/node_modules/ws/lib'
 @ ./node_modules/ws/lib/Validation.js 10:22-47
 @ ./node_modules/ws/lib/Receiver.js
 @ ./node_modules/ws/index.js
 @ ./node_modules/puppeteer/node6/Connection.js
 @ ./node_modules/puppeteer/node6/Launcher.js
 @ ./node_modules/puppeteer/node6/Puppeteer.js
 @ ./lib/utils/puppeteer-launcher.js
 @ ./lib/test.js
 @ ./handler.js

WARNING in ./node_modules/es6-promise/dist/es6-promise.js
Module not found: Error: Can't resolve 'vertx' in '/home/lorenzo/Dev/yamm-importer/node_modules/es6-promise/dist'
 @ ./node_modules/es6-promise/dist/es6-promise.js 140:16-26
 @ ./node_modules/es6-promisify/dist/promise.js
 @ ./node_modules/es6-promisify/dist/promisify.js
 @ ./node_modules/agent-base/index.js
 @ ./node_modules/https-proxy-agent/index.js
 @ ./node_modules/puppeteer/utils/ChromiumDownloader.js
 @ ./node_modules/puppeteer/node6/Launcher.js
 @ ./node_modules/puppeteer/node6/Puppeteer.js
 @ ./lib/utils/puppeteer-launcher.js
 @ ./lib/test.js
 @ ./handler.js

lorenzos avatar Jan 21 '18 11:01 lorenzos