crawlee
crawlee copied to clipboard
Expose playwright `BrowserContext` options
Which package is the feature request for? If unsure which one to select, leave blank
None
Feature
Expose the Playwright browser.newContext options to Crawlee users so they can use more advanced Playwright features.
Motivation
Playwright has many features such as recording and replaying network requests, using Chrome extensions, and emulating different devices that are only exposed through the Playwright browser.newContext options that create a BrowserContext class.
A classic Playwright library example is this:
import { chromium, devices } from 'playwright';
const browser = await chromium.launch(browserOpts);
const context = await browser.newContext(devices['iPhone 11']);
const page = await context.newPage();
Currently AFAICT Crawlee allows users to modify browser.launch options through the Crawlee PlaywrightLaunchContext API, but does not allow users to modify the Playwright BrowserContext.
Ideal solution or implementation, and any additional constraints
One implementation could be to extend the Crawlee LaunchContext options for Playwright with an additional browserContext field that has the Playwright browser.newContext options type. Then @crawlee/browser-pool would use that to launch the browser context.
It appears that the current code actually uses the same LaunchContext.launchOptions type for launching the browser context, while the type is for playwright's launch function which creates a browser.
https://github.com/apify/crawlee/blob/2f9aa4e22017d08a396c1bca948b0c5c1e3ab84c/packages/browser-pool/src/playwright/playwright-plugin.ts#L109
This uses the launchPersistentContext which has similar (maybe identical) options to browser.newContext.
However I tried testing to see if these get passed through by adding these options to my PlaywrightCrawler:
const crawler = new PlaywrightCrawler({
launchContext: {
launcher: firefox,
launchOptions: {
recordHar: {
path: DATA_DIR + "/data.har",
}
} as BrowserContextOptions
},
})
but it didn't work.
So you could move these options to a new field, or update the types so it is a union of the browser and context launch options.
Thanks for this great library!
Alternative solutions or implementations
No response
Other context
No response
Actually it did write a 240MB HAR file to data.har! I still think it would be best to separate the browserContext options from the general browser options, and alternatively make it a union type.
I tried to introduce new option for this, but it's far away from trivial, so in the end, I surrendered and only improved the type of the launchOptions.