puppeteer-extra
puppeteer-extra copied to clipboard
[Info] Beta versions available for the new `puppeteer-extra` & `playwright-extra`
~~The rewrite of puppeteer-extra
is available for beta testing, to gather some final feedback before we make the switch. This issue is meant as a canonical reference on how to install those packages (also please report bugs/feedback here). 😄~~
edit: playwright-extra
has landed: https://github.com/berstend/puppeteer-extra/tree/master/packages/playwright-extra
:point_right: We will follow a different approach than a full rewrite with a shared code base between puppeteer-extra
and playwright-extra
, more info can be found in this comment
(Click for previous (now outdated) info)
:x::x::x: The information below is outdated and does not apply anymore
Context
- A major new version (rewrite) of
puppeteer-extra
is close to public release 🎉 - The new plugin framework will support both Puppeteer and Playwright (adding
playwright-extra
) - Every existing
puppeteer-extra-plugin-*
should continue working with the newpuppeteer-extra
- In addition new plugins (
@extra/*
) are being released that support both Puppeteer and Playwright
More info can be found in the PR: #303
How to install (must read ⚡)
Important:
- ⚡ The temporary tagged beta packages have issues with
npm
, please useyarn
to install those. ⚡ - The beta versions are published under the
@next
tag, you must add this tag when installing them.
Available packages
Important:
- ⚡ The documentation links below point to the unreleased
automation-extra
branch, the installation instructions for the new packages there are written from the perspective of being released and don't mention the@next
tag. Please install the packages as instructed in this issue. ⚡
puppeteer-extra
yarn add puppeteer@5 puppeteer-extra@next
- Supports existing
puppeteer-extra-plugin-*
as well as the new@extra/*
plugins
playwright-extra
yarn add [email protected] playwright-extra@next
- Supports Chrome, Firefox and Webkit and the new
@extra/*
plugins
New plugins
- These plugins use the new base plugin and are compatible with both Playwright & Puppeteer.
@extra/recaptcha
yarn add @extra/recaptcha@next
- A plugin for playwright-extra & puppeteer-extra to solve reCAPTCHAs and hCaptchas automatically.
- Supports Playwright & Puppeteer, Chrome, Firefox and Webkit.
@extra/humanize
yarn add @extra/humanize@next
- A plugin for playwright-extra & puppeteer-extra to humanize input (mouse movements, etc)
- Supports Playwright & Puppeteer, Chrome, Firefox and Webkit.
Existing plugins
- All existing puppeteer-extra plugins are meant to stay compatible with the new
puppeteer-extra
. Please report any issues you might experience.
Notes
- Existing
puppeteer-extra-plugin-*
will work withpuppeteer-extra
, notplaywright-extra
. - An updated version of the popular stealth plugin with playwright support is not yet available.
- The target audience of those beta packages are developers interested in testing them and providing feedback before the public release. I don't advise using them in production unless you really know what you're doing :-)
- Puppeteer broke typings support in their latest releases, use puppeteer@5 when using TypeScript
Hey @berstend, I'm having an issue with using versions of Playwright greater than 1.8.0. I ran into this when attempting to use Playwright 1.10.0 with playwright-extra inside a docker container. The browser launch fails because the library tries to use the 1.8 browser binary (chromium-844399) which is missing from a clean Playwright 1.10 install. When I swap out playwright-extra for the vanilla library, the browsers launch fine. I was not running into this issue locally because the 1.8 browser binaries are left over from a previous Playwright 1.8 install. I suspect this might have something to do with the version being locked here https://github.com/berstend/puppeteer-extra/blob/cb586077a848241f119b3a9c051e93babc2ce7a8/packages/playwright-extra/package.json#L64 For reference, I am using the official Playwright docker image here https://github.com/microsoft/playwright/blob/master/utils/docker/Dockerfile.bionic. Thoughts?
@j3lev thanks for the feedback! are you using the regular playwright
package as well? If so that one should take precedence over the "bundled" -core one.
The reason we're including the -core package as a dependency currently is: a) typings (so non-TS VScode users get Intellisense automatically) b) to re-export the top level stuff from the vanilla package (errors, selectors, devices): https://github.com/berstend/puppeteer-extra/blob/cb586077a848241f119b3a9c051e93babc2ce7a8/packages/playwright-extra/src/index.ts#L61-L76
Overall I'm not too happy to have -core as a regular (and especially version pinned) dependency and will overhaul that before we make the release. A few days ago I realized I should be able to export getters here and lazy load any installed -core or non-core playwright lib. Will give this a go soon. :-)
Thanks for reporting this issue (I suspected pinning the version would cause issues down the line) 👍
I am using playwright 1.10.0 alongside and it does not work. I also tried in the past with 1.9 and was having the same issue but didn't have time to look into it.
@j3lev oh you're correct - I was mistaken as we're currently trying to require -core prior to the regular one: https://github.com/berstend/puppeteer-extra/blob/cb586077a848241f119b3a9c051e93babc2ce7a8/packages/automation-extra/src/base.ts#L303-L304
I will make sure to change that behavior when I overhaul that aspect.
The automation-extra stuff is currently a beta version, if it's mission-critical for you to get this resolved asap let me know. ;-)
(Using [email protected]
for the time being would be a workaround of sorts)
I updated the installation instructions in this issue to install [email protected]
and save the next beta tester from the experience you had. :-) (This is of course just a temporary fix until I had time to resolve it properly)
Yeah for sure, only reason I bring it up is to be able to take advantage of new features that are coming out such as channels https://playwright.dev/docs/browsers#google-chrome--microsoft-edge, also some new selector syntax was introduced in 1.9.0 which is nice as well. Keep up the good work and I cannot wait to see this get released!
@berstend, сould you tell, does using of playwright-extra with stealth-plugin solve this issue, or stealth-plugin still does not work with playwright due to their own intermediate wire protocol instead of CDP?
@WindBridges there's currently no stealth plugin for playwright (and the existing one is not compatible). The main reason is time constraints on my end and playwright making it more difficult to hook into the CDP flow so porting the stuff over from the existing plugin isn't just copy paste but more involved. :-)
@WindBridges you can use the minified version of the stealth plugin from the extract-stealth-evasions
, works perfectly fine for me with playwright
.
@WindBridges you can use the minified version of the stealth plugin from the
extract-stealth-evasions
, works perfectly fine for me withplaywright
.
Unfortunately that will only result in cursory fixes, quite a few things rely on CDP and are not part of the js evasions scripts.
hey @berstend! hope all is well, i was just wondering when we can expect to use newer versions of playwright with this, the only reason i ask is that 1.8 appears to be no longer listed in the official Playwright docs, so I'm guessing they may drop support for it quite soon
Existing puppeteer-extra-plugin-* will work with puppeteer-extra, not playwright-extra.
BTW, I use puppeteer-extra-plugin-stealth with playwrite for a long time with such hack:
const enabledEvasions = [/*list of my requred evasions*/];
const evasions = enabledEvasions.map(e => new require(`puppeteer-extra-plugin-stealth/evasions/${e}`));
const stealth = {
callbacks: [],
async evaluateOnNewDocument(...args) {
this.callbacks.push({cb: args[0], a: args[1]})
}
}
evasions.forEach(e => e().onPageCreated(stealth));
for (let evasion of stealth.callbacks) {
await browserContext.addInitScript(evasion.cb, evasion.a);
}
@berstend don't know if it's dirty or not, thanks to @terion-name actually I got it work with [email protected]. This is the code I used and the results via screenshots:
(async () => {
const { chromium } = require("playwright");
const browser = await chromium.launch({
channel: "chrome",
headless: true,
});
const originalUserAgent = await (await (await browser.newContext()).newPage()).evaluate(() => { return navigator.userAgent });
const browserContext = await browser.newContext({
userAgent: originalUserAgent.replace("Headless", ""),
});
const page = await browserContext.newPage();
const enabledEvasions = [
'chrome.app',
'chrome.csi',
'chrome.loadTimes',
'chrome.runtime',
'iframe.contentWindow',
'media.codecs',
'navigator.hardwareConcurrency',
'navigator.languages',
'navigator.permissions',
'navigator.plugins',
'navigator.webdriver',
'sourceurl',
// 'user-agent-override', // doesn't work since playwright has no page.browser()
'webgl.vendor',
'window.outerdimensions'
];
const evasions = enabledEvasions.map(e => new require(`puppeteer-extra-plugin-stealth/evasions/${e}`));
const stealth = {
callbacks: [],
async evaluateOnNewDocument(...args) {
this.callbacks.push({ cb: args[0], a: args[1] })
}
}
evasions.forEach(e => e().onPageCreated(stealth));
for (let evasion of stealth.callbacks) {
await browserContext.addInitScript(evasion.cb, evasion.a);
}
await page.goto("https://bot.sannysoft.com");
await page.waitForTimeout(1000);
await page.screenshot({ path: "screenshot-sannysoft.jpg", fullPage: true });
await page.goto("https://abrahamjuliot.github.io/creepjs/");
await page.waitForTimeout(1000);
await page.screenshot({ path: "screenshot-creepjs.jpg", fullPage: true });
await page.goto("http://f.vision/");
await page.waitForTimeout(1000);
await page.screenshot({ path: "screenshot-fvision.jpg", fullPage: true });
await page.goto("https://pixelscan.net/");
await page.waitForTimeout(1000);
await page.screenshot({ path: "screenshot-pixelscan.jpg", fullPage: true });
// await browserContext.waitForEvent("close");
await browser.close();
})();
@maiux I've also been using this hack for my program since berstend doesn't seem to have time/interest in updating it.
hey @berstend! hope all is well, i was just wondering when we can expect to use newer versions of playwright with this
📙 TL;DR: Progress on the switch to the new codebase had stalled but we're back at it now.
A little more context:
Apologies for the delay on this - puppeteer unfortunately breaking TypeScript typings a while back took the wind out of the sails of the planned release of the new branch and I've been waiting a bit for the dust to settle. 😅
Given the projects popularity I'm a bit cautious about replacing the old versions until I'm satisfied it'll be a smooth and backwards compatible transition for everyone, hence we haven't made the switch yet :)
I haven't updated the @next
packages in the meantime as the packaging/deployment of those is a bit brittle and cumbersome (our monorepo tool lerna unfortunately fails to resolve their dependencies automatically, which means I need to bump all internal dependencies manually)
I'm not a huge fan of the current limbo situation though and want us to switch to the new codebase as soon as possible.
Things I have on my shortlist in this regard:
- Figure out the definitive best way how we want to deal with typings in our packages (
peerDependencies
are a mess, if we don't ship with them as a dependency regular pptr < v5 JS users don't get to enjoy Intellisense hints, if we do ship with a specific version TS users with a different pptr/pw version might run into conflicts, puppeteer switched to TS/built-in types themselves a while ago, etc.) - Backport some recent changes made in the old recaptcha plugin to the new
@extra/recaptcha
- Optimize the plugin API to allow for easy script injection in workers as well
- See if I can find usage numbers on older puppeteer versions, dropping support for some older versions would make the migration a lot easier
Regarding playwright + stealth: The "hacks" discussed here are fine 😄 Unfortunately they only cover JS based evasions and don't handle launch args or more importantly CDP commands, which is the main issue I ran into when working on the playwright stealth port. Playwright only allows to create a new CDP session whereas we need to hook into the existing one. I did however find a promising workaround I'm currently fleshing out, so a stealth plugin with full playwright support is on the horizon again. :)
@berstend have you tried to add a feature request to playwright? they're very responsive and open about their development and what could or couldn't be done. Access to CDP sessions or whatever else you miss.
@berstend That's great news! Just wanted to say thank you in the name of all the people using this software! So yeah thanks for the great and open source work, we all appreciate it very much!
Hey there, is there any chance the playwright dependency can be moved up to the latest? The playwright-core
dependency is 9 minor versions behind?
Would be great to bump playwright-core dependency to 1.18.0
@berstend Just judging by the NPM downloads of puppeteer, there seems to be a major amount of people hanging on the puppeteer@5 version (and puppeteer@1 for some reason). I'm one of them, but for me this is only due to puppeteer-extra
not being compatible with puppeteer
versions >=6
.
I can't speak for anyone else, but I do think the majority of users would be fine with dropping support for puppeteer < 6
, or using an older version of puppeteer-extra
if they really need it (I've been using the current version of puppeteer-extra
just fine, but I would love to update).
I realize that puppeteer breaking their typings must be really frustrating. And their issue mess is probably not helping. If we can help you with any specific tasks that need doing, let us know. I'm sure a few people would love to help (including me), but don't want to interfere with the upgrade process.
@maiux thank you for sharing your code, it was quite helpful! That being said the browser seems to have a Trust Score of 0% when visting https://abrahamjuliot.github.io/creepjs/. Do you know any ways to circumvent that?
What's the current status of stealth in playwright? Have the CSP issues been resolved? I've been digging to find the answer to no avail.
Playwright only allows to create a new CDP session whereas we need to hook into the existing one.
@berstend FWIW, their documentation includes a connectOverCDP
method that seems to be doing what you describe.
@berstend you can patch the Playwright source, or fork it. It's quite easy to expose the CDP session for Chromium browsers. Are you really just stcuk on this? Shall we help? It would be magical to have your extension for Playwright, which has a much friendlier API than Puppeteer.
Wow, seems like we have @berstend back! Can't wait to know what does the "unpinned this issue" means 😄
LETS GOOOOOOOOO
Wow, seems like we have @berstend back! Can't wait to know what does the "unpinned this issue" means 😄
Quick update regarding playwright support 😄
I reflected on why I never finished the automation-extra
branch and came to the following realizations:
- A massive rewrite like this is a nightmare to merge in, especially with a project that's used in production by many
- While the new code was in beta mode the regular plugin development did not stop and I had essentially doubled my workload by having to keep the old and the new plugins (supporting both playwright & puppeteer) in sync
- Bad timing: Typings are already tricky for a version-agnostic plugin framework, it didn't help that puppeteer switched from @types/puppeteer to their built-in (and initially broken) types
- Playwright's APIs kept diverging from puppeteer as time went on, in addition they made things less "hacker friendly" (client/server split, custom wire protocol, overzealous input validation, using
exports
in their package.json which prevents monkey patching, etc)
Instead I decided to follow a more iterative approach:
- No complete rewrite of the whole project or sharing code with
puppeteer-extra
(for the moment),playwright-extra
is it's own thing which makes rolling it out much easier - No new shared plugin base class for now
- Looking at download numbers the main plugins of interest are
stealth
&recaptcha
- I've worked out a "compatibility shim" that allows loading in these major
puppeteer-extra
plugins without changes intoplaywright-extra
- Looking at download numbers the main plugins of interest are
While working on this I've also found solutions to quite a few long standing issues around types ("how can we use playwright types internally without imposing a specific version on the user", "how to re-export top-level module exports like playwright.devices
without shipping with a specific version of it") and other things
The existing stealth and recaptcha plugins are already working well (even with Firefox & Webkit 🎉) and most of the explorative code is done. I'm now working on cleanup, tests and documentation and should be able to release this quite soon and without any potential side-effects (it's just a single new package: playwright-extra
)
TL;DR: Instead of a complete rewrite with a new shared plugin framework we start with a playwright-extra
version that is compatible with the majority of puppeteer-extra plugins 😄
playwright-extra
using a puppeteer compatibility layer to load in puppeteer-extra-plugin-recaptcha
to solve captchas in webkit 😁
@berstend Sounds great! Stealth for Playwright would be very useful (read: 100% necessary) in one of our projects.
Do you have any kind of ETA on this release? No pressure :grin:
Do you have any kind of ETA on this release? No pressure 😁
I do you one better (than an ETA) by just releasing it 😄
Successfully published:
- [email protected]
Readme: https://github.com/berstend/puppeteer-extra/tree/master/packages/playwright-extra
Feedback welcome!
That's amazing @berstend ! Will test it out.