crawl4ai
crawl4ai copied to clipboard
[Bug]: Unable to execute arun_many with managed browsers and cdp
crawl4ai version
v0.7.6
Expected Behavior
Hello ! I am trying to use crawl4ai for concurrent authenticated crawling. However, I'm running into errors when combining cdp with arun_many.
In terminal, I launched a browser using cli:
- I created a profile using the cli and turned headless to off.
- I run a browser using cdp and my profile directory
crwl cdp -d /home/user/.crawl4ai/profiles/
A new instance was launched, I confirmed I could interact with it using native playwright.
Crawl4ai script:
import asyncio
from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig, CacheMode
# Define URLs to crawl
URLS = [
"https://example.com",
"https://httpbin.org/html",
"https://www.python.org",
]
async def main():
# Configure CDP browser connection
browser_cfg = BrowserConfig(
browser_type="cdp",
cdp_url="http://localhost:9222",
verbose=True,
)
# Configure crawler settings
crawler_cfg = CrawlerRunConfig(
cache_mode=CacheMode.BYPASS,
page_timeout=60000,
wait_until="domcontentloaded",
)
# Crawl all URLs using arun_many
async with AsyncWebCrawler(config=browser_cfg) as crawler:
results = await crawler.arun_many(urls=URLS, config=crawler_cfg)
for result in results:
print(f"\nURL: {result.url}")
if result.success:
print(f"✓ Success | Content length: {len(result.markdown)}")
else:
print(f"✗ Failed: {result.error_message}")
if __name__ == "__main__":
asyncio.run(main())
Expectaed behaviour: multiple tabs should open in chromium and the crawls are executed in parallel. Current behaviour: only one tab is opened, only one crawl is finished, the rest runs into errors.
Current Behavior
Only one tab is opened, only one crawl is finished, the rest runs into errors.
I think the bug might be coming from a race condition in the browser manager, all sessions are competing over the same page. https://github.com/unclecode/crawl4ai/blob/40173eeb7374dd5d3ab84b355b28e88d43703ee0/crawl4ai/browser_manager.py#L1055
Is this reproducible?
Yes
Inputs Causing the Bug
Steps to Reproduce
Code snippets
OS
Linux
Python version
3.11
Browser
No response
Browser version
No response
Error logs & Screenshots (if applicable)
No response