crawl4ai
crawl4ai copied to clipboard
Fix cdp setting with managed browser
Summary
This PR fixes a concurrency bug in AsyncWebCrawler.arun_many() when using managed browsers. The issue was that all concurrent crawl tasks were fighting over one shared tab, causing failures. The fix modifies the get_page() method in browser_manager.py to always create new pages instead of reusing context.pages[0] for managed browsers.
Fixes #1563
List of files changed and why
-
crawl4ai/browser_manager.py - Modified the get_page() method to create new pages for managed browsers instead of reusing the first page, which was causing tab contention in concurrent scenarios.
-
tests/test_cdp_concurrency_compact.py - Created a comprehensive test suite that verifies the concurrency fix works correctly across multiple scenarios including basic arun_many functionality, managed CDP browsers, and various concurrency patterns.
How Has This Been Tested?
-
Created and ran a comprehensive test suite with 6 different test scenarios:
- Basic arun_many functionality test
- Managed CDP browser test
- Concurrency verification test
- Concurrency fix demonstration
- Before/after behavior comparison
- Reference pattern test
-
All tests pass successfully, demonstrating that:
- Multiple concurrent crawl tasks no longer fight over shared tabs
- The fix works with both basic and managed browser configurations
- Backward compatibility is maintained
- Performance is not negatively impacted
-
Manual verification of the fix with both basic and managed browser configurations.
Checklist:
- [x] My code follows the style guidelines of this project
- [x] I have performed a self-review of my own code
- [x] I have commented my code, particularly in hard-to-understand areas
- [x] I have made corresponding changes to the documentation
- [x] I have added/updated unit tests that prove my fix is effective or that my feature works
- [x] New and existing unit tests pass locally with my changes
I was eagerly anticipating this pull request to land. Great job guys
Hi team, thank you for all the hard work! any way we can help here to move this PR forward? This solves a bug that we have been encountering using cdp.
Hi there! We will include this in our next release in two weeks. Stay tuned! 💜
Hi @ntohidi , thank you for all the hard work! I was wondering if a realease it planned soon for this PR