crawl4ai icon indicating copy to clipboard operation
crawl4ai copied to clipboard

[Bug]: Unable to execute arun_many with managed browsers and cdp

Open medmahmoudi26 opened this issue 1 month ago • 1 comments

crawl4ai version

v0.7.6

Expected Behavior

Hello ! I am trying to use crawl4ai for concurrent authenticated crawling. However, I'm running into errors when combining cdp with arun_many.

In terminal, I launched a browser using cli:

  • I created a profile using the cli and turned headless to off.
  • I run a browser using cdp and my profile directory

crwl cdp -d /home/user/.crawl4ai/profiles/

A new instance was launched, I confirmed I could interact with it using native playwright.

Crawl4ai script:

import asyncio
from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig, CacheMode

# Define URLs to crawl
URLS = [
    "https://example.com",
    "https://httpbin.org/html",
    "https://www.python.org",
]

async def main():
    # Configure CDP browser connection
    browser_cfg = BrowserConfig(
        browser_type="cdp",
        cdp_url="http://localhost:9222",
        verbose=True,
    )
    
    # Configure crawler settings
    crawler_cfg = CrawlerRunConfig(
        cache_mode=CacheMode.BYPASS,
        page_timeout=60000,
        wait_until="domcontentloaded",
    )
    
    # Crawl all URLs using arun_many
    async with AsyncWebCrawler(config=browser_cfg) as crawler:
        results = await crawler.arun_many(urls=URLS, config=crawler_cfg)
        
        for result in results:
            print(f"\nURL: {result.url}")
            if result.success:
                print(f"✓ Success | Content length: {len(result.markdown)}")
            else:
                print(f"✗ Failed: {result.error_message}")

if __name__ == "__main__":
    asyncio.run(main())

Expectaed behaviour: multiple tabs should open in chromium and the crawls are executed in parallel. Current behaviour: only one tab is opened, only one crawl is finished, the rest runs into errors.

Current Behavior

Only one tab is opened, only one crawl is finished, the rest runs into errors.

I think the bug might be coming from a race condition in the browser manager, all sessions are competing over the same page. https://github.com/unclecode/crawl4ai/blob/40173eeb7374dd5d3ab84b355b28e88d43703ee0/crawl4ai/browser_manager.py#L1055

Image

Is this reproducible?

Yes

Inputs Causing the Bug


Steps to Reproduce


Code snippets


OS

Linux

Python version

3.11

Browser

No response

Browser version

No response

Error logs & Screenshots (if applicable)

No response

medmahmoudi26 avatar Oct 26 '25 00:10 medmahmoudi26