crawl4ai icon indicating copy to clipboard operation
crawl4ai copied to clipboard

Bot issue

Open rashidwiizb opened this issue 7 months ago • 3 comments

crawl4ai version

Crawl4AI 0.5.0.post8

Expected Behavior

Hi, I'm new to Crawl4AI and I'm facing some issues that need clarification.

I'm trying to scrape data from sites like PitchBook and CrunchBase, but I'm encountering human verification screens. As a result, I'm getting the verification page content instead of the actual page content unless I manually verify on an open browser tab.

My questions are:

  1. How can I bypass human verification or scrape website content without opening a browser tab?
  2. How can I scrape inner pages or multiple pages of a website?
  3. How can I deploy this with API calls or a similar approach?

Current Behavior

(crawl4ai-env) wiizbusiness@WiiZs-Laptop web-crawler % python test.py [INIT].... → Crawl4AI 0.5.0.post8 [FETCH]... ↓ https://pitchbook.com/profiles/company/106751-98... | Status: True | Time: 1.61s [SCRAPE].. ◆ https://pitchbook.com/profiles/company/106751-98... | Time: 0.004s [COMPLETE] ● https://pitchbook.com/profiles/company/106751-98... | Status: True | Total: 1.61s Was success: True

Icon for pitchbook.compitchbook.com

Verifying you are human. This may take a few seconds. pitchbook.com needs to review the security of your connection before proceeding. Verification successful Waiting for pitchbook.com to respond... Ray ID: 932b2267be43179a Performance & security by Cloudflare

(crawl4ai-env) wiizbusiness@WiiZs-Laptop web-crawler %

Is this reproducible?

Yes

Inputs Causing the Bug

url= 'https://pitchbook.com/profiles/company/106751-98',

Steps to Reproduce


Code snippets


OS

macOS

Python version

Python 3.13.2

Browser

No response

Browser version

No response

Error logs & Screenshots (if applicable)

No response

rashidwiizb avatar Apr 19 '25 08:04 rashidwiizb

@unclecode Even i am facing same issue

Harinib-Kore avatar May 26 '25 10:05 Harinib-Kore

@rashidwiizb there r multiple approaches can be exercised here. One is to follow "identify based" crawling. You login to your account, build your human identity, then use the "user profile data directory" to attach your new browser session and then start crawling. I have demonstrated in one of my video. We also discuss about it, tomorrow meetup and release the video later, please check

unclecode avatar May 28 '25 16:05 unclecode

Hi @unclecode i have checked pitchbook with above way and thats looks fine. But same bot verification is happening on crunchbase too , and there i can't bypass it by above way. Its refreshing and still showing bot verification page.

Also i need to handle this for all website and all kind of bot verfication according with website

rashidwiizb avatar May 28 '25 17:05 rashidwiizb

Reproducible URL:

$ crwl https://japanworld.it/en/preordini/25559-furyu-tenitol-spriggan-yu-ominae-4580736406933.html -o markdown
# japanworld.it
Verifying you are human. This may take a few seconds.
japanworld.it needs to review the security of your connection before proceeding.
Verification successful
Waiting for japanworld.it to respond...
Ray ID: `95c0626e9b9d0cea`
Performance & security by [Cloudflare](https://www.cloudflare.com?utm_source=challenge&utm_campaign=m)

Any suggestions about this issue?

ghost avatar Jul 08 '25 14:07 ghost

Hey! If you’re trying to get past bot protection, you’ve got a couple of solid options with Crawl4AI:

  1. Use Stealth Mode We ship a built‑in stealth mode that randomizes fingerprints and tightens up automation signals. You can enable it in your config or from the CLI. Full docs with examples: https://docs.crawl4ai.com/advanced/undetected-browser/
  2. Add CAPTCHA Solving (e.g. CapSolver) For sites that still challenge you with CAPTCHA, you can plug in CapSolver or another provider to auto-solve those. We have a ready-made example and walkthrough here: https://github.com/unclecode/crawl4ai/tree/main/docs/examples/capsolver_captcha_solver

Try stealth first, and layer in CapSolver if the site still blocks you. Thanks!

SohamKukreti avatar Nov 20 '25 15:11 SohamKukreti