browserless
browserless copied to clipboard
The headless Chrome/Chromium driver on top of Puppeteer.
browserless is an efficient driver for controlling headless browsers built on top of puppeteer developed for scenarios where performance matters.
Highlights
- Puppeteer-like API for common tasks (text, screenshot, html, pdf).
- Built-in evasion techniques to prevent being blocked.
- Built-in adblocker for canceling unnecessary requests.
- Shell interaction via Browserless CLI.
- Easy Google Lighthouse integration.
- Automatic retry & error handling.
- Sensible good defaults.
Installation
You can install it via npm:
$ npm install browserless puppeteer --save
browserless is backed by puppeteer, so you need to install it as well.
You can use it next to puppeteer
, puppeteer-core
or puppeteer-firefox
, interchangeably.
Usage
This is a full example for showcase all the browserless capabilities:
const createBrowserless = require('browserless')
const termImg = require('term-img')
// First, create a browserless factory
// that it will keep a singleton process running
const browserlessFactory = createBrowserless()
// After that, you can create as many browser context
// as you need. The browser contexts won't share cookies/cache
// with other browser contexts.
const browserless = await browserlessFactory.createContext()
// Perform the action you want, e.g., getting the HTML markup
const buffer = await browserless.screenshot('http://example.com', {
device: 'iPhone 6'
})
console.log(termImg(buffer))
// After your task is done, destroy your browser context
await browserless.destroyContext()
// At the end, gracefully shutdown the browser process
await browserlessFactory.close()
As you can see, browserless is implemented using a single browser process and creating/destroying specific browser contexts.
You can read more about that at technical details section.
If you're already using puppeteer, you can upgrade to use browserless instead almost with no effort.
Additionally, you can use some specific packages in your codebase, interacting with them from puppeteer.
Initialization
All methods follow the same interface:
-
<url>
: The target URL. It's required. -
[options]
: Specific settings for the method. It's optional.
The methods follows an async interface, returning a Promise
.
.constructor(options)
It initializes a singleton browserless process, returning a factory that will be used for creating browser contexts:
const browserlessFactory = require('browserless')
const { createContext } = browserlessFactory({
timeout: 25000,
lossyDeviceName: true,
ignoreHTTPSErrors: true
})
// Now every time you call `createContext`
// it will be create a browser context.
const browserless = await createContext({ retry: 2 })
They are some propetary browserless options; The rest of options will be passed to puppeter.launch.
options
Additionally, you can setup:
defaultDevice
type: string
default: 'Macbook Pro 13'
Sets a consistent device viewport for each page.
lossyDeviceName
type: boolean
default: false
It enables lossy detection over the device descriptor input.
const browserless = require('browserless')({ lossyDeviceName: true })
browserless.getDevice({ device: 'macbook pro 13' })
browserless.getDevice({ device: 'MACBOOK PRO 13' })
browserless.getDevice({ device: 'macbook pro' })
browserless.getDevice({ device: 'macboo pro' })
This setting is oriented for find the device even if the descriptor device name is not exactly the same.
mode
type: string
default: launch
values: 'launch'
| 'connect'
It defines if browser should be spawned using puppeteer.launch or puppeteer.connect
timeout
type: number
default: 30000
This setting will change the default maximum navigation time.
puppeteer
type: Puppeteer
default: puppeteer
|puppeteer-core
|puppeteer-firefox
It's automatically detected based on your dependencies
being supported puppeteer, puppeteer-core or puppeteer-firefox.
.createContext(options)
Now you have your browserless factory instantiated, you can create browser contexts on demand:
const browserless = browserlessFactory.createContext({
retry: 2
})
Every browser context is isolated. They won't share cookies/cache with other browser contexts. They also can contain specific options.
options
Any browser.createIncognitoBrowserContext#options can be passed.
Additionally, you can setup:
retry
type: number
default: 2
The number of retries that can be performed before considering a navigation as failed.
.browser
It returns the Browser instance associated with your browserless factory.
const browser = await browserlessFactory.browser()
console.log('My browser PID is', browser.proces().pid)
.respawn
It will respawn the singleton browser associated with your browserless factory.
const getPID = promise => (await promise).process().pid
console.log('Process PID:', await getPID(browserlessFactory.browser()))
await browserlessFactory.respawn()
console.log('Process PID:', await getPID(browserlessFactory.browser()))
This method is am implementation detail, normally you don't need to call it.
.close
It will close the singleton browser associated with your browserless factory.
const exitHook = require('exit-hook')
exitHook(browserlessFactory.close())
It should be used to gracefully shutdown your resources.
Methods
.html(url, options)
It serializes the content from the target url
into HTML.
const html = await browserless.html('https://example.com')
console.log(html)
options
See browserless.goto to know all the options and values supported.
.text(url, options)
It serializes the content from the target url
into plain text.
const text = await browserless.text('https://example.com')
console.log(text)
options
See browserless.goto to know all the options and values supported.
.pdf(url, options)
It generates the PDF version of a website behind an url
.
const buffer = await browserless.pdf('https://example.com')
console.log(`PDF generated in ${buffer.byteLength()} bytes`)
options
This method use the following options by default:
{
margin: '0.35cm',
printBackground: true,
scale: 0.65
}
See browserless.goto to know all the options and values supported.
Also, any page.pdf option is supported.
Additionally, you can setup:
margin
type: string
| string[]
default: '0.35cm'
It sets paper margins. All possible units are:
-
px
for pixel. -
in
for inches. -
cm
for centimeters. -
mm
for millimeters.
You can pass an object
object specifying each corner side of the paper:
const buffer = await browserless.pdf(url.toString(), {
margin: {
top: '0.35cm',
bottom: '0.35cm',
left: '0.35cm',
right: '0.35cm'
}
})
Or, in case you pass an string
, it will be used for all the sides:
const buffer = await browserless.pdf(url.toString(), {
margin: '0.35cm'
})
.screenshot(url, options)
It takes a screenshot from the target url
.
const buffer = await browserless.screenshot('https://example.com')
console.log(`Screenshot taken in ${buffer.byteLength()} bytes`)
options
This method use the following options by default:
{
device: 'macbook pro 13'
}
See browserless.goto to know all the options and values supported.
Also, any page.screenshot option is supported.
Additionally, you can setup:
codeScheme
type: string
default: 'atom-dark'
When this value is present and the response 'Content-Type'
header is 'json'
, it beautifies HTML markup using Prism.
The syntax highlight theme can be customized, being possible to setup:
- A prism-themes identifier (e.g.,
'dracula'
). - A remote URL (e.g.,
'https://unpkg.com/prism-theme-night-owl'
).
element
type: string
Capture the DOM element matching the given CSS selector. It will wait for the element to appear in the page and to be visible.
overlay
type: object
After the screenshot has been taken, this option allows you to place the screenshot into a fancy overlay
You can configure the overlay specifying:
-
browser: It sets the browser image overlay to use, being
light
anddark
supported values. -
background: It sets the background to use, being supported to pass:
- An hexadecimal/rgb/rgba color code, eg.
#c1c1c1
. - A CSS gradient, eg.
linear-gradient(225deg, #FF057C 0%, #8D0B93 50%, #321575 100%)
- An image url, eg.
https://source.unsplash.com/random/1920x1080
.
- An hexadecimal/rgb/rgba color code, eg.
const buffer = await browserless.screenshot(url.toString(), {
styles: [
'.crisp-client, #cookies-policy { display: none; }'
],
overlay: {
browser: 'dark',
background:
'linear-gradient(45deg, rgba(255,18,223,1) 0%, rgba(69,59,128,1) 66%, rgba(69,59,128,1) 100%)'
}
})
.destroyContext
It will destroy the current browser context
const browserless = await browserlessFactory.createContext({ retry: 0 })
const content = await browserless.html('https://example.com')
await browserless.destroyContext()
.getDevice(options)
Giving a specific device descriptons, this method will be the devices settings for it.
browserless.getDevice({ device: 'Macbook Pro 15' })
// {
// userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.89 Safari/537.36',
// viewport: {
// width: 1440,
// height: 900,
// deviceScaleFactor: 2,
// isMobile: false,
// hasTouch: false,
// isLandscape: false
// }
// }
It extends from puppeteer.devices, adding some missing devices there.
options
device
type: string
The device descriptor name. It's used to find the rest presets associated with it.
When lossyDeviceName is enabled, a fuzzy search rather than a strict search will be performed in order to maximize getting a result back.
viewport
type: object
An extra of viewport settings that will be merged with the device presets.
browserless.getDevice({
device: 'iPad',
viewport: {
isLandscape: true
}
})
headers
type: object
An extra of headers that will be merged with the device presets.
browserless.getDevice({
device: 'iPad',
headers: {
'user-agent': 'googlebot'
}
})
.evaluate(fn, gotoOpts)
It exposes an interface for creating your own evaluate function, passing you the page
and response
.
The fn
will receive page
and response
as arguments:
const ping = browserless.evaluate((page, response) => ({
statusCode: response.status(),
url: response.url(),
redirectUrls: response.request().redirectChain()
}))
await ping('https://example.com')
// {
// "statusCode": 200,
// "url": "https://example.com/",
// "redirectUrls": []
// }
You don't need to close the page; It will be closed automatically.
Internally, the method performs a browserless.goto, being possible to pass extra arguments as second parameter:
const serialize = browserless.evaluate(page => page.evaluate(() => document.body.innerText), {
waitUntil: 'domcontentloaded'
})
await serialize('https://example.com')
// '<!DOCTYPE html><html><div>…'
.goto(page, options)
It performs a page.goto with a lot of extra capabilities:
const browserless = require('browserless')
const page = await browserless.page()
const { response, device } = await browserless.goto(page, { url: 'http://example.com' })
options
Any option passed here will bypass to page.goto.
Additionally, you can setup:
abortTypes
type: array
default: []
It sets the ability to abort requests based on the resource type.
adblock
type: boolean
default: true
It enabled the builtin adblocker by Cliqz that aborts unnecessary third party requests associated with ads services.
animations
type: boolean
default: false
Disable CSS animations and transitions, also it sets prefers-reduced-motion consequently.
click
type: string
| string[]
Click the DOM element matching the given CSS selector.
device
type: string
default: 'macbook pro 13'
It specifies the device descriptor to use in order to retrieve userAgent
and viewport
.
evasions
type: string[]
default: require('@browserless/goto').evasions
It makes your Headless undetectable, preventing to being blocked.
These techniques are used by antibot systems to check if you are a real browser and block any kind of automated access. All the evasion techniques implemented are:
Evasion | Description |
---|---|
chromeRuntime |
Ensure window.chrome is defined. |
stackTraces |
Prevent detect Puppeteer via variable name. |
mediaCodecs |
Ensure media codedcs are defined. |
navigatorPermissions |
Mock over Notification.permissions . |
navigatorPlugins |
Ensure your browser has NavigatorPlugins defined. |
navigatorWebdriver |
Ensure Navigator.webdriver exists. |
randomizeUserAgent |
Use a different User-Agent every time. |
webglVendor |
Ensure WebGLRenderingContext & WebGL2RenderingContext are defined. |
The evasion techniques are enabled by default. You can omit techniques just filtering them:
const createBrowserless = require('browserless')
const evasions = require('@browserless/goto').evasions.filter(
evasion => evasion !== 'randomizeUserAgent'
)
const browserlessFactory = createBrowserless({ evasions })
headers
type: object
An object containing additional HTTP headers to be sent with every request.
const browserless = require('browserless')
const page = await browserless.page()
await browserless.goto(page, {
url: 'http://example.com',
headers: {
'user-agent': 'googlebot',
cookie: 'foo=bar; hello=world'
}
})
This sets visibility: hidden
on the matched elements.
html
type: string
In case you provide HTML markup, a page.setContent avoiding fetch the content from the target URL.
javascript
type: boolean
default: true
When it's false
, it disables JavaScript on the current page.
mediaType
type: string
default: 'screen'
Changes the CSS media type of the page using page.emulateMediaType.
modules
type: string
| string[]
Injects <script type="module"> into the browser page.
It can accept:
- Absolute URLs (e.g.,
'https://cdn.jsdelivr.net/npm/@microlink/[email protected]/src/browser.js'
). - Local file (e.g., `'local-file.js').
- Inline code (e.g.,
"document.body.style.backgroundColor = 'red'"
).
const buffer = await browserless.screenshot(url.toString(), {
modules: [
'https://cdn.jsdelivr.net/npm/@microlink/[email protected]/src/browser.js',
'local-file.js',
"document.body.style.backgroundColor = 'red'"
]
})
colorScheme
type: string
default: 'no-preference'
Sets prefers-color-scheme CSS media feature, used to detect if the user has requested the system use a 'light'
or 'dark'
color theme.
scripts
type: string
| string[]
Injects <script> into the browser page.
It can accept:
- Absolute URLs (e.g.,
'https://cdn.jsdelivr.net/npm/@microlink/[email protected]/src/browser.js'
). - Local file (e.g., `'local-file.js').
- Inline code (e.g.,
"document.body.style.backgroundColor = 'red'"
).
const buffer = await browserless.screenshot(url.toString(), {
scripts: [
'https://cdn.jsdelivr.net/npm/[email protected]/dist/jquery.min.js',
'local-file.js',
"document.body.style.backgroundColor = 'red'"
]
})
Prefer to use modules whenever possible.
scroll
type: string
Scroll to the DOM element matching the given CSS selector.
styles
type: string
| string[]
Injects <style> into the browser page.
It can accept:
- Absolute URLs (e.g.,
'https://cdn.jsdelivr.net/npm/[email protected]/dist/dark.css'
). - Local file (e.g., `'local-file.css').
- Inline code (e.g.,
"body { background: red; }"
).
const buffer = await browserless.screenshot(url.toString(), {
styles: [
'https://cdn.jsdelivr.net/npm/[email protected]/dist/dark.css',
'local-file.css',
'body { background: red; }'
]
})
timezone
type: string
It changes the timezone of the page.
url
type: string
The target URL.
viewport
It will setup a custom viewport, using page.setViewport method.
waitForSelector
type:string
Wait a quantity of time, selector or function using page.waitForSelector.
waitForTimeout
type:number
Wait a quantity of time, selector or function using page.waitForTimeout.
waitUntil
type: string
| string[]
default: 'auto'
values: 'auto'
| 'load'
| 'domcontentloaded'
| 'networkidle0'
| 'networkidle2'
When to consider navigation succeeded.
If you provide an array of event strings, navigation is considered to be successful after all events have been fired.
Events can be either:
-
'auto'
: A combination of'load'
and'networkidle2'
in a smart way to wait the minimum time necessary. -
'load'
: Consider navigation to be finished when the load event is fired. -
'domcontentloaded'
: Consider navigation to be finished when the DOMContentLoaded event is fired. -
'networkidle0'
: Consider navigation to be finished when there are no more than 0 network connections for at least 500 ms. -
'networkidle2'
: Consider navigation to be finished when there are no more than 2 network connections for at least 500 ms.
.context
It returns the BrowserContext associated with your instance.
const browserContext = await browserless.context()
console.log({ isIncognito: browserContext.isIncognito() })
// => { isIncognito: true }
.page
It returns a standalone Page associated with the current browser context.
const page = await browserless.page()
await page.content()
// => '<html><head></head><body></body></html>'
Command Line Interface
You can perform any browserless action from your terminal.
Just you need to install @browserless/cli
globally:
npm install @browserless/cli --global
Additionally, can do it under demand using npx
:
npx @browserless/cli --help
That's the preferred way to interact with the CLI under CI/CD scenarios.
Lighthouse
browserless has a Lighthouse integration that connects to a Puppeteer instance in a simple way.
const lighthouse = require('@browserless/lighthouse')
const { writeFile } = require('fs/promises')
const report = await lighthouse('https://example.com')
await writeFile('report.json', JSON.stringify(report, null, 2))
The report will be generated url
, extending from lighthouse:default
settings, being these settings the same than Google Chrome Audits reports on Developer Tools.
options
The second argument can contain lighthouse specific settings The following options are used by default:
{
logLevel: 'error',
output: 'json',
device: 'desktop',
onlyCategories: ['perfomance', 'best-practices', 'accessibility', 'seo']
}
See Lighthouse configuration to know all the options and values supported.
Additionally, you can setup:
getBrowserless
type: function
default: require('browserless')
The browserless instance to use for getting the browser.
logLevel
type: string
default: 'error'
values: 'silent'
| 'error'
| 'info'
| 'verbose'
The level of logging to enable.
output
type: string
| string[]
default: 'json'
values: 'json'
| 'csv'
| 'html'
The type(s) of report output to be produced.
device
type: string
default: 'desktop'
values: 'desktop'
| 'mobile'
| 'none'
How emulation (useragent, device screen metrics, touch) should be applied. 'none'
indicates Lighthouse should leave the host browser as-is.
onlyCategories
type: string[]
| null
default: ['performance', 'best-practices', 'accessibility', 'seo']
values: 'performance'
| 'best-practices'
| 'accessibility'
| 'pwa'
| 'seo'
Includes only the specified categories in the final report.
Packages
browserless is internally divided into multiple packages for ensuring just use the minimum quantity of code necessary for your use case.
FAQ
Q: Why use browserless
over puppeteer
?
browserless not replace puppeteer, it complements. It's just a syntactic sugar layer over official Headless Chrome oriented for production scenarios.
Q: Why do you block ads scripts by default?
Headless navigation is expensive compared with just fetch the content from a website.
In order to speed up the process, we block ads scripts by default because they are so bloat.
Q: My output is different from the expected
Probably browserless was too smart and it blocked a request that you need.
You can active debug mode using DEBUG=browserless
environment variable in order to see what is happening behind the code:
Consider open an issue with the debug trace.
Q: I want to use browserless
with my AWS Lambda like project
Yes, check chrome-aws-lambda to setup AWS Lambda with a binary compatible.
License
browserless © Microlink, released under the MIT License.
Authored and maintained by Microlink with help from contributors.
The logo has been designed by xinh studio.
microlink.io · GitHub microlinkhq · Twitter @microlinkhq