puppeteer-extra icon indicating copy to clipboard operation
puppeteer-extra copied to clipboard

[Info] Beta versions available for the new `puppeteer-extra` & `playwright-extra`

Open berstend opened this issue 3 years ago • 30 comments

~~The rewrite of puppeteer-extra is available for beta testing, to gather some final feedback before we make the switch. This issue is meant as a canonical reference on how to install those packages (also please report bugs/feedback here). 😄~~

edit: playwright-extra has landed: https://github.com/berstend/puppeteer-extra/tree/master/packages/playwright-extra


:point_right: We will follow a different approach than a full rewrite with a shared code base between puppeteer-extra and playwright-extra, more info can be found in this comment

(Click for previous (now outdated) info)

:x::x::x: The information below is outdated and does not apply anymore

Context

  • A major new version (rewrite) of puppeteer-extra is close to public release 🎉
  • The new plugin framework will support both Puppeteer and Playwright (adding playwright-extra)
  • Every existing puppeteer-extra-plugin-* should continue working with the new puppeteer-extra
  • In addition new plugins (@extra/*) are being released that support both Puppeteer and Playwright

More info can be found in the PR: #303

How to install (must read ⚡)

Important:

  • ⚡ The temporary tagged beta packages have issues with npm, please use yarn to install those.
  • The beta versions are published under the @next tag, you must add this tag when installing them.

Available packages

Important:

  • ⚡ The documentation links below point to the unreleased automation-extra branch, the installation instructions for the new packages there are written from the perspective of being released and don't mention the @next tag. Please install the packages as instructed in this issue.

puppeteer-extra

yarn add puppeteer@5 puppeteer-extra@next
  • Supports existing puppeteer-extra-plugin-* as well as the new @extra/* plugins

playwright-extra

yarn add [email protected] playwright-extra@next
  • Supports Chrome, Firefox and Webkit and the new @extra/* plugins

New plugins

  • These plugins use the new base plugin and are compatible with both Playwright & Puppeteer.

@extra/recaptcha

yarn add @extra/recaptcha@next
  • A plugin for playwright-extra & puppeteer-extra to solve reCAPTCHAs and hCaptchas automatically.
  • Supports Playwright & Puppeteer, Chrome, Firefox and Webkit.

@extra/humanize

yarn add @extra/humanize@next
  • A plugin for playwright-extra & puppeteer-extra to humanize input (mouse movements, etc)
  • Supports Playwright & Puppeteer, Chrome, Firefox and Webkit.

Existing plugins

  • All existing puppeteer-extra plugins are meant to stay compatible with the new puppeteer-extra. Please report any issues you might experience.

Notes

  • Existing puppeteer-extra-plugin-* will work with puppeteer-extra, not playwright-extra.
  • An updated version of the popular stealth plugin with playwright support is not yet available.
  • The target audience of those beta packages are developers interested in testing them and providing feedback before the public release. I don't advise using them in production unless you really know what you're doing :-)
  • Puppeteer broke typings support in their latest releases, use puppeteer@5 when using TypeScript

berstend avatar Mar 17 '21 11:03 berstend

Hey @berstend, I'm having an issue with using versions of Playwright greater than 1.8.0. I ran into this when attempting to use Playwright 1.10.0 with playwright-extra inside a docker container. The browser launch fails because the library tries to use the 1.8 browser binary (chromium-844399) which is missing from a clean Playwright 1.10 install. When I swap out playwright-extra for the vanilla library, the browsers launch fine. I was not running into this issue locally because the 1.8 browser binaries are left over from a previous Playwright 1.8 install. I suspect this might have something to do with the version being locked here https://github.com/berstend/puppeteer-extra/blob/cb586077a848241f119b3a9c051e93babc2ce7a8/packages/playwright-extra/package.json#L64 For reference, I am using the official Playwright docker image here https://github.com/microsoft/playwright/blob/master/utils/docker/Dockerfile.bionic. Thoughts?

j3lev avatar Apr 15 '21 16:04 j3lev

@j3lev thanks for the feedback! are you using the regular playwright package as well? If so that one should take precedence over the "bundled" -core one.

The reason we're including the -core package as a dependency currently is: a) typings (so non-TS VScode users get Intellisense automatically) b) to re-export the top level stuff from the vanilla package (errors, selectors, devices): https://github.com/berstend/puppeteer-extra/blob/cb586077a848241f119b3a9c051e93babc2ce7a8/packages/playwright-extra/src/index.ts#L61-L76

Overall I'm not too happy to have -core as a regular (and especially version pinned) dependency and will overhaul that before we make the release. A few days ago I realized I should be able to export getters here and lazy load any installed -core or non-core playwright lib. Will give this a go soon. :-)

Thanks for reporting this issue (I suspected pinning the version would cause issues down the line) 👍

berstend avatar Apr 15 '21 22:04 berstend

I am using playwright 1.10.0 alongside and it does not work. I also tried in the past with 1.9 and was having the same issue but didn't have time to look into it.

j3lev avatar Apr 15 '21 22:04 j3lev

@j3lev oh you're correct - I was mistaken as we're currently trying to require -core prior to the regular one: https://github.com/berstend/puppeteer-extra/blob/cb586077a848241f119b3a9c051e93babc2ce7a8/packages/automation-extra/src/base.ts#L303-L304

I will make sure to change that behavior when I overhaul that aspect.

The automation-extra stuff is currently a beta version, if it's mission-critical for you to get this resolved asap let me know. ;-)

(Using [email protected] for the time being would be a workaround of sorts)

berstend avatar Apr 15 '21 22:04 berstend

I updated the installation instructions in this issue to install [email protected] and save the next beta tester from the experience you had. :-) (This is of course just a temporary fix until I had time to resolve it properly)

berstend avatar Apr 15 '21 22:04 berstend

Yeah for sure, only reason I bring it up is to be able to take advantage of new features that are coming out such as channels https://playwright.dev/docs/browsers#google-chrome--microsoft-edge, also some new selector syntax was introduced in 1.9.0 which is nice as well. Keep up the good work and I cannot wait to see this get released!

j3lev avatar Apr 15 '21 22:04 j3lev

@berstend, сould you tell, does using of playwright-extra with stealth-plugin solve this issue, or stealth-plugin still does not work with playwright due to their own intermediate wire protocol instead of CDP?

windbridges avatar Apr 22 '21 14:04 windbridges

@WindBridges there's currently no stealth plugin for playwright (and the existing one is not compatible). The main reason is time constraints on my end and playwright making it more difficult to hook into the CDP flow so porting the stuff over from the existing plugin isn't just copy paste but more involved. :-)

berstend avatar Apr 22 '21 16:04 berstend

@WindBridges you can use the minified version of the stealth plugin from the extract-stealth-evasions, works perfectly fine for me with playwright.

opahopa avatar Apr 30 '21 19:04 opahopa

@WindBridges you can use the minified version of the stealth plugin from the extract-stealth-evasions, works perfectly fine for me with playwright.

Unfortunately that will only result in cursory fixes, quite a few things rely on CDP and are not part of the js evasions scripts.

berstend avatar Apr 30 '21 21:04 berstend

hey @berstend! hope all is well, i was just wondering when we can expect to use newer versions of playwright with this, the only reason i ask is that 1.8 appears to be no longer listed in the official Playwright docs, so I'm guessing they may drop support for it quite soon

j3lev avatar Jun 03 '21 22:06 j3lev

Existing puppeteer-extra-plugin-* will work with puppeteer-extra, not playwright-extra.

BTW, I use puppeteer-extra-plugin-stealth with playwrite for a long time with such hack:

const enabledEvasions = [/*list of my requred evasions*/];
    const evasions = enabledEvasions.map(e => new require(`puppeteer-extra-plugin-stealth/evasions/${e}`));
    const stealth = {
      callbacks: [],
      async evaluateOnNewDocument(...args) {
        this.callbacks.push({cb: args[0], a: args[1]})
      }
    }
    evasions.forEach(e => e().onPageCreated(stealth));
    for (let evasion of stealth.callbacks) {
      await browserContext.addInitScript(evasion.cb, evasion.a);
    }

terion-name avatar Jun 26 '21 19:06 terion-name

@berstend don't know if it's dirty or not, thanks to @terion-name actually I got it work with [email protected]. This is the code I used and the results via screenshots:

(async () => {
    const { chromium } = require("playwright");

    const browser = await chromium.launch({
        channel: "chrome",
        headless: true,
    });

    const originalUserAgent = await (await (await browser.newContext()).newPage()).evaluate(() => { return navigator.userAgent });

    const browserContext = await browser.newContext({
        userAgent: originalUserAgent.replace("Headless", ""),
    });

    const page = await browserContext.newPage();

    const enabledEvasions = [
        'chrome.app',
        'chrome.csi',
        'chrome.loadTimes',
        'chrome.runtime',
        'iframe.contentWindow',
        'media.codecs',
        'navigator.hardwareConcurrency',
        'navigator.languages',
        'navigator.permissions',
        'navigator.plugins',
        'navigator.webdriver',
        'sourceurl',
        // 'user-agent-override', // doesn't work since playwright has no page.browser()
        'webgl.vendor',
        'window.outerdimensions'
    ];
    const evasions = enabledEvasions.map(e => new require(`puppeteer-extra-plugin-stealth/evasions/${e}`));
    const stealth = {
        callbacks: [],
        async evaluateOnNewDocument(...args) {
            this.callbacks.push({ cb: args[0], a: args[1] })
        }
    }
    evasions.forEach(e => e().onPageCreated(stealth));
    for (let evasion of stealth.callbacks) {
        await browserContext.addInitScript(evasion.cb, evasion.a);
    }

    await page.goto("https://bot.sannysoft.com");
    await page.waitForTimeout(1000);
    await page.screenshot({ path: "screenshot-sannysoft.jpg", fullPage: true });

    await page.goto("https://abrahamjuliot.github.io/creepjs/");
    await page.waitForTimeout(1000);
    await page.screenshot({ path: "screenshot-creepjs.jpg", fullPage: true });

    await page.goto("http://f.vision/");
    await page.waitForTimeout(1000);
    await page.screenshot({ path: "screenshot-fvision.jpg", fullPage: true });

    await page.goto("https://pixelscan.net/");
    await page.waitForTimeout(1000);
    await page.screenshot({ path: "screenshot-pixelscan.jpg", fullPage: true });

    // await browserContext.waitForEvent("close");
    await browser.close();
})();

screenshot-creepjs screenshot-fvision screenshot-pixelscan screenshot-sannysoft

maiux avatar Sep 11 '21 16:09 maiux

@maiux I've also been using this hack for my program since berstend doesn't seem to have time/interest in updating it.

floppabro1337 avatar Sep 12 '21 08:09 floppabro1337

hey @berstend! hope all is well, i was just wondering when we can expect to use newer versions of playwright with this

📙 TL;DR: Progress on the switch to the new codebase had stalled but we're back at it now.


A little more context:

Apologies for the delay on this - puppeteer unfortunately breaking TypeScript typings a while back took the wind out of the sails of the planned release of the new branch and I've been waiting a bit for the dust to settle. 😅

Given the projects popularity I'm a bit cautious about replacing the old versions until I'm satisfied it'll be a smooth and backwards compatible transition for everyone, hence we haven't made the switch yet :)

I haven't updated the @next packages in the meantime as the packaging/deployment of those is a bit brittle and cumbersome (our monorepo tool lerna unfortunately fails to resolve their dependencies automatically, which means I need to bump all internal dependencies manually)

I'm not a huge fan of the current limbo situation though and want us to switch to the new codebase as soon as possible.

Things I have on my shortlist in this regard:

  • Figure out the definitive best way how we want to deal with typings in our packages (peerDependencies are a mess, if we don't ship with them as a dependency regular pptr < v5 JS users don't get to enjoy Intellisense hints, if we do ship with a specific version TS users with a different pptr/pw version might run into conflicts, puppeteer switched to TS/built-in types themselves a while ago, etc.)
  • Backport some recent changes made in the old recaptcha plugin to the new @extra/recaptcha
  • Optimize the plugin API to allow for easy script injection in workers as well
  • See if I can find usage numbers on older puppeteer versions, dropping support for some older versions would make the migration a lot easier

Regarding playwright + stealth: The "hacks" discussed here are fine 😄 Unfortunately they only cover JS based evasions and don't handle launch args or more importantly CDP commands, which is the main issue I ran into when working on the playwright stealth port. Playwright only allows to create a new CDP session whereas we need to hook into the existing one. I did however find a promising workaround I'm currently fleshing out, so a stealth plugin with full playwright support is on the horizon again. :)

berstend avatar Sep 13 '21 05:09 berstend

@berstend have you tried to add a feature request to playwright? they're very responsive and open about their development and what could or couldn't be done. Access to CDP sessions or whatever else you miss.

andrisi avatar Sep 22 '21 10:09 andrisi

@berstend That's great news! Just wanted to say thank you in the name of all the people using this software! So yeah thanks for the great and open source work, we all appreciate it very much!

Osiris-Team avatar Oct 10 '21 18:10 Osiris-Team

Hey there, is there any chance the playwright dependency can be moved up to the latest? The playwright-core dependency is 9 minor versions behind?

j3lev avatar Dec 29 '21 15:12 j3lev

Would be great to bump playwright-core dependency to 1.18.0

ya-mouse avatar Jan 22 '22 23:01 ya-mouse

@berstend Just judging by the NPM downloads of puppeteer, there seems to be a major amount of people hanging on the puppeteer@5 version (and puppeteer@1 for some reason). I'm one of them, but for me this is only due to puppeteer-extra not being compatible with puppeteer versions >=6.

puppeteer-versions

I can't speak for anyone else, but I do think the majority of users would be fine with dropping support for puppeteer < 6, or using an older version of puppeteer-extra if they really need it (I've been using the current version of puppeteer-extra just fine, but I would love to update).

I realize that puppeteer breaking their typings must be really frustrating. And their issue mess is probably not helping. If we can help you with any specific tasks that need doing, let us know. I'm sure a few people would love to help (including me), but don't want to interfere with the upgrade process.

1nVitr0 avatar Feb 10 '22 13:02 1nVitr0

@maiux thank you for sharing your code, it was quite helpful! That being said the browser seems to have a Trust Score of 0% when visting https://abrahamjuliot.github.io/creepjs/. Do you know any ways to circumvent that?

jv1968 avatar Feb 21 '22 15:02 jv1968

What's the current status of stealth in playwright? Have the CSP issues been resolved? I've been digging to find the answer to no avail.

aus10code avatar Mar 01 '22 11:03 aus10code

Playwright only allows to create a new CDP session whereas we need to hook into the existing one.

@berstend FWIW, their documentation includes a connectOverCDP method that seems to be doing what you describe.

paambaati avatar Mar 02 '22 06:03 paambaati

@berstend you can patch the Playwright source, or fork it. It's quite easy to expose the CDP session for Chromium browsers. Are you really just stcuk on this? Shall we help? It would be magical to have your extension for Playwright, which has a much friendlier API than Puppeteer.

andrisi avatar Mar 03 '22 16:03 andrisi

Wow, seems like we have @berstend back! Can't wait to know what does the "unpinned this issue" means 😄

dilame avatar Jun 20 '22 15:06 dilame

LETS GOOOOOOOOO

b5414 avatar Jun 20 '22 19:06 b5414

Wow, seems like we have @berstend back! Can't wait to know what does the "unpinned this issue" means 😄

Quick update regarding playwright support 😄

I reflected on why I never finished the automation-extra branch and came to the following realizations:

  • A massive rewrite like this is a nightmare to merge in, especially with a project that's used in production by many
  • While the new code was in beta mode the regular plugin development did not stop and I had essentially doubled my workload by having to keep the old and the new plugins (supporting both playwright & puppeteer) in sync
  • Bad timing: Typings are already tricky for a version-agnostic plugin framework, it didn't help that puppeteer switched from @types/puppeteer to their built-in (and initially broken) types
  • Playwright's APIs kept diverging from puppeteer as time went on, in addition they made things less "hacker friendly" (client/server split, custom wire protocol, overzealous input validation, using exports in their package.json which prevents monkey patching, etc)

Instead I decided to follow a more iterative approach:

  • No complete rewrite of the whole project or sharing code with puppeteer-extra (for the moment), playwright-extra is it's own thing which makes rolling it out much easier
  • No new shared plugin base class for now
    • Looking at download numbers the main plugins of interest are stealth & recaptcha
    • I've worked out a "compatibility shim" that allows loading in these major puppeteer-extra plugins without changes into playwright-extra

While working on this I've also found solutions to quite a few long standing issues around types ("how can we use playwright types internally without imposing a specific version on the user", "how to re-export top-level module exports like playwright.devices without shipping with a specific version of it") and other things

The existing stealth and recaptcha plugins are already working well (even with Firefox & Webkit 🎉) and most of the explorative code is done. I'm now working on cleanup, tests and documentation and should be able to release this quite soon and without any potential side-effects (it's just a single new package: playwright-extra)

TL;DR: Instead of a complete rewrite with a new shared plugin framework we start with a playwright-extra version that is compatible with the majority of puppeteer-extra plugins 😄

image

playwright-extra using a puppeteer compatibility layer to load in puppeteer-extra-plugin-recaptcha to solve captchas in webkit 😁

berstend avatar Jun 29 '22 08:06 berstend

@berstend Sounds great! Stealth for Playwright would be very useful (read: 100% necessary) in one of our projects.

Do you have any kind of ETA on this release? No pressure :grin:

michelgammelgaard avatar Jun 30 '22 08:06 michelgammelgaard

Do you have any kind of ETA on this release? No pressure 😁

I do you one better (than an ETA) by just releasing it 😄

Successfully published:
 - [email protected]

Readme: https://github.com/berstend/puppeteer-extra/tree/master/packages/playwright-extra

Feedback welcome!

berstend avatar Jul 03 '22 15:07 berstend

That's amazing @berstend ! Will test it out.

eliassorensen avatar Jul 05 '22 06:07 eliassorensen