Extensions feature broken and Parallel mode fails with Chrome profiles: "Failed to connect to Chrome URL"
Issue Description
When using Botasaurus in parallel=True mode with a Chrome profile, the driver systematically fails to connect to Chrome's debugging port, resulting in multiple empty browser windows being spawned before eventually (sometimes) succeeding.
Context: Why Chrome Profiles Are Needed
Chrome 136+ Issue: Google is removing the --load-extension flag from official Chrome builds, making it impossible to programmatically load extensions.
Chrome for Testing limitation: While Chrome for Testing still supports --load-extension, it's more easily detected by anti-bot systems. Google Search quickly blocks IPs, making it non-viable for production scraping.
Our workaround: We use pre-configured Chrome profiles where extensions (notably CapSolver for automatic CAPTCHA solving) are already manually installed. This approach is crucial for reliable CAPTCHA automation with official Chrome builds.
The Bug
Using parallel=True with a Chrome profile causes systematic connection failures:
Exception: Failed to connect to Chrome URL: http://127.0.0.1:{port}/json/version.
Traceback: ../core/browser.py", line 59, in ensure_chrome_is_alive
Observed Behavior
When launching a task with multiple URLs in parallel:
- For each worker, Botasaurus spawns multiple empty Chrome windows successively
- Only one window eventually loads the target URL
- This suggests Botasaurus fails to detect successful browser launch (related to connection and ports issues)
- A retry mechanism appears to kick in, spawning new browsers in a loop
Important note: Without a profile, parallelism works perfectly. The issue is specifically tied to using Chrome profiles.
Reproduction
@browser(
parallel=True,
profile="my-profile", # Profile with CapSolver extension pre-installed
)
def scrape_task(driver: AntiDetectDriver, data):
driver.get(data)
# ... scraping logic
scrape_task([
"https://example1.com",
"https://example2.com",
"https://example3.com",
"https://example4.com",
"https://example5.com",
"https://example6.com",
])
Investigation Paths
- Profile locking: The Chrome profile might be locked by Chrome itself or a previous process, creating conflicts in parallel mode
- Port management: Debugging ports may not be released quickly enough when reusing workers
- Connection timeout: The connection timeout might be too short when Chrome is loading with a profile
- Profile uniqueness: Perhaps each worker needs its own unique profile copy?
- Race condition with port allocation: Workers might attempt to bind to debugging ports still in TIME_WAIT state (30-120s TCP cleanup), which appears free but isn't actually available yet
Why This Is Critical
This blocks the ability to use essential extensions like CapSolver for CAPTCHA automation in parallel scraping tasks with official Chrome builds. Without a fix, users must choose between:
- Single-threaded scraping (slow)
- Chrome for Testing (easily detected and blocked)
- No extension support (can't solve CAPTCHAs)
Environment
- OS: macOS (but likely affects all platforms)
- Botasaurus version: Latest
- Chrome version: 141 (official builds)
Question for maintainers: Why would Chrome profile usage specifically affect the debugging port connection? Is there a profile initialization delay we're not accounting for?