crawl4ai [Bug]: Unable to execute arun_many with managed browsers and cdp

[Bug]: Unable to execute arun_many with managed browsers and cdp

Open medmahmoudi26 opened this issue 1 month ago • 1 comments

crawl4ai version

v0.7.6

Expected Behavior

Hello ! I am trying to use crawl4ai for concurrent authenticated crawling. However, I'm running into errors when combining cdp with arun_many.

In terminal, I launched a browser using cli:

I created a profile using the cli and turned headless to off.
I run a browser using cdp and my profile directory

crwl cdp -d /home/user/.crawl4ai/profiles/

A new instance was launched, I confirmed I could interact with it using native playwright.

Crawl4ai script:

import asyncio
from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig, CacheMode

# Define URLs to crawl
URLS = [
    "https://example.com",
    "https://httpbin.org/html",
    "https://www.python.org",
]

async def main():
    # Configure CDP browser connection
    browser_cfg = BrowserConfig(
        browser_type="cdp",
        cdp_url="http://localhost:9222",
        verbose=True,
    )
    
    # Configure crawler settings
    crawler_cfg = CrawlerRunConfig(
        cache_mode=CacheMode.BYPASS,
        page_timeout=60000,
        wait_until="domcontentloaded",
    )
    
    # Crawl all URLs using arun_many
    async with AsyncWebCrawler(config=browser_cfg) as crawler:
        results = await crawler.arun_many(urls=URLS, config=crawler_cfg)
        
        for result in results:
            print(f"\nURL: {result.url}")
            if result.success:
                print(f"✓ Success | Content length: {len(result.markdown)}")
            else:
                print(f"✗ Failed: {result.error_message}")

if __name__ == "__main__":
    asyncio.run(main())

Expectaed behaviour: multiple tabs should open in chromium and the crawls are executed in parallel. Current behaviour: only one tab is opened, only one crawl is finished, the rest runs into errors.

Current Behavior

Only one tab is opened, only one crawl is finished, the rest runs into errors.

I think the bug might be coming from a race condition in the browser manager, all sessions are competing over the same page. https://github.com/unclecode/crawl4ai/blob/40173eeb7374dd5d3ab84b355b28e88d43703ee0/crawl4ai/browser_manager.py#L1055

Is this reproducible?

Yes

Inputs Causing the Bug

Steps to Reproduce

Code snippets

OS

Linux

Python version

3.11

Browser

No response

Browser version

No response

Error logs & Screenshots (if applicable)

No response

Oct 26 '25 00:10 medmahmoudi26

crawl4ai crawl4ai copied to clipboard

[Bug]: Unable to execute arun_many with managed browsers and cdp

crawl4ai version

Expected Behavior

Current Behavior

Is this reproducible?

Inputs Causing the Bug

Steps to Reproduce

Code snippets

OS

Python version

Browser

Browser version

Error logs & Screenshots (if applicable)

crawl4ai
crawl4ai copied to clipboard