crawlee icon indicating copy to clipboard operation
crawlee copied to clipboard

Add login context helper to playwright and puppeteer crawlers

Open B4nan opened this issue 5 months ago • 3 comments

Which package is the feature request for? If unsure which one to select, leave blank

@crawlee/playwright (PlaywrightCrawler)

Feature

Add a new context helper for playwright and puppeteer crawlers for simple login flows.

Motivation

Bigger part of logins are rather simple, they either contain the username/email and password field, or have a two step form (first provide username/email, then provide the password). We want to have a simple context helper for those simple cases to simplify logging into protected sites.

Ideal solution or implementation, and any additional constraints

It should be a CrawlingContext helper added specifically for PlaywrightCrawler and PuppeteerCrawler.

async requestHandler({ login }) {
    await login({ username: '...', password: '...' });
});
  • the implementation should detect whether there is a login form on the page, the heuristic should be also configurable as a callback
  • if no login form detected, the function should resolve, since there is nothing to do
  • if login form is detected, it should detect what kind of a form it is (one or two step, feel free to consider other login form types), fill it in and submit it
  • it should detect if the login succeeded or failed and resolve/reject based on that
  • the detection of a successful/failed login should be configurable as a callback, try to come up with a good default heuristic
  • username and password will be the only required options, page object will be already bound to the function like in the other context helpers
  • (optional) there should also be an option with a callback for dealing with captchas

Alternative solutions or implementations

No response

Other context

For inspiration, see how other context helpers are implemented, e.g. parseWithCheerio. This helper should be available only on the two browser crawlers. You can start with playwright only, porting the code to puppeteer is optional for the initial PR. You can use those sites to test this:

  • https://www.saucedemo.com/
  • http://zero.webappsecurity.com/
  • https://automationexercise.com/login

Examples of sites with the two-step login form:

  • https://claude.ai/
  • https://accounts.evernote.com/login

B4nan avatar Jul 02 '25 12:07 B4nan

Hi @B4nan

We are students from CodeDay. We were working on issue #2261. Thanks to your feedbacks, we were able to close it quickly. Now we would like to take on this issue. Since this will be our 2nd issue, is there any concern or guidelines we should follow? Thank you.

mikeng07 avatar Jul 08 '25 21:07 mikeng07

Hi @B4nan

We are students from CodeDay. We were working on issue #2261. Thanks to your feedbacks, we were able to close it quickly. Now we would like to take on this issue. Since this will be our 2nd issue, is there any concern or guidelines we should follow? Thank you.

I am a part of this CodeDay team. I am looking forward to working on this issue!

ShuWald avatar Jul 08 '25 22:07 ShuWald

Hi @B4nan

We are students from CodeDay. We were working on issue #2261. Thanks to your feedbacks, we were able to close it quickly. Now we would like to take on this issue. Since this will be our 2nd issue, is there any concern or guidelines we should follow? Thank you.

I’m super excited to work on this and would be truly grateful if you @B4nan could share any external resources that might help us better understand and implement all the required features. For example, any architecture/design Apify wants us to do.

Looking forward to bringing this to life! :)

NikkiAung avatar Jul 08 '25 22:07 NikkiAung