Extensions feature broken and Parallel mode fails with Chrome profiles: "Failed to connect to Chrome URL"

Open rubenamran-arch opened this issue 2 months ago • 0 comments

Issue Description

When using Botasaurus in parallel=True mode with a Chrome profile, the driver systematically fails to connect to Chrome's debugging port, resulting in multiple empty browser windows being spawned before eventually (sometimes) succeeding.

Context: Why Chrome Profiles Are Needed

Chrome 136+ Issue: Google is removing the --load-extension flag from official Chrome builds, making it impossible to programmatically load extensions.

Chrome for Testing limitation: While Chrome for Testing still supports --load-extension, it's more easily detected by anti-bot systems. Google Search quickly blocks IPs, making it non-viable for production scraping.

Our workaround: We use pre-configured Chrome profiles where extensions (notably CapSolver for automatic CAPTCHA solving) are already manually installed. This approach is crucial for reliable CAPTCHA automation with official Chrome builds.

The Bug

Using parallel=True with a Chrome profile causes systematic connection failures:

Exception: Failed to connect to Chrome URL: http://127.0.0.1:{port}/json/version.
Traceback: ../core/browser.py", line 59, in ensure_chrome_is_alive

Observed Behavior

When launching a task with multiple URLs in parallel:

For each worker, Botasaurus spawns multiple empty Chrome windows successively
Only one window eventually loads the target URL
This suggests Botasaurus fails to detect successful browser launch (related to connection and ports issues)
A retry mechanism appears to kick in, spawning new browsers in a loop

Important note: Without a profile, parallelism works perfectly. The issue is specifically tied to using Chrome profiles.

Reproduction

@browser(
    parallel=True,
    profile="my-profile",  # Profile with CapSolver extension pre-installed
)
def scrape_task(driver: AntiDetectDriver, data):
    driver.get(data)
    # ... scraping logic
    
scrape_task([
    "https://example1.com",
    "https://example2.com",
    "https://example3.com",
    "https://example4.com",
    "https://example5.com",
    "https://example6.com",
])

Investigation Paths

Profile locking: The Chrome profile might be locked by Chrome itself or a previous process, creating conflicts in parallel mode
Port management: Debugging ports may not be released quickly enough when reusing workers
Connection timeout: The connection timeout might be too short when Chrome is loading with a profile
Profile uniqueness: Perhaps each worker needs its own unique profile copy?
Race condition with port allocation: Workers might attempt to bind to debugging ports still in TIME_WAIT state (30-120s TCP cleanup), which appears free but isn't actually available yet

Why This Is Critical

This blocks the ability to use essential extensions like CapSolver for CAPTCHA automation in parallel scraping tasks with official Chrome builds. Without a fix, users must choose between:

Single-threaded scraping (slow)
Chrome for Testing (easily detected and blocked)
No extension support (can't solve CAPTCHAs)

Environment

OS: macOS (but likely affects all platforms)
Botasaurus version: Latest
Chrome version: 141 (official builds)

Question for maintainers: Why would Chrome profile usage specifically affect the debugging port connection? Is there a profile initialization delay we're not accounting for?

Oct 23 '25 12:10 rubenamran-arch