crawl4ai icon indicating copy to clipboard operation
crawl4ai copied to clipboard

[Bug]: Proxy not working

Open abhineettandon opened this issue 8 months ago β€’ 4 comments

crawl4ai version

0.5.0.post8

Expected Behavior

Should work normal and return the output

Current Behavior

tried different proxy credentials but does not work with any. it works fine when proxy script is removed.

Γ— Unexpected error in _crawl_web at line 582 in _crawl_web (lib/python3.13/site-
β”‚ packages/crawl4ai/async_crawler_strategy.py):
β”‚ Error: Failed on navigating ACS-GOTO:
β”‚ Page.goto: net::ERR_INVALID_AUTH_CREDENTIALS at https://example.com/
β”‚ Call log:
β”‚ - navigating to "https://example.com/", waiting until "domcontentloaded"

Is this reproducible?

Yes

Inputs Causing the Bug

set proxy string ("http://username:password@server:port") in BrowserConfig

Steps to Reproduce


Code snippets

async def main():
    browserConfig = BrowserConfig(
        proxy="http://USERNAME:PASSWORD@SERVER:PORT",
    )

    async with AsyncWebCrawler(config=browserConfig) as crawler:
        url = "https://example.com"
        result = await crawler.arun(url)
        return result

print(asyncio.run(main()))

OS

macOS 15.4

Python version

3.13

Browser

No response

Browser version

No response

Error logs & Screenshots (if applicable)

Image

abhineettandon avatar Apr 16 '25 15:04 abhineettandon

I've faced this issue after trying to use socks5 based proxies and I feel its issue with playwright

Aayushhumai avatar Apr 24 '25 09:04 Aayushhumai

facing the same issue!

moarshy avatar Apr 30 '25 02:04 moarshy

same issue here! and When I wrote a custom implementation with Playwright - proxy works

rubinsh avatar May 15 '25 09:05 rubinsh

After reading the source code and some fiddling, I was able to work around it as follows:

    username = os.getenv("PROXY_USERNAME")
    password = os.getenv("PROXY_PASSWORD")
    proxy_address = os.getenv("PROXY_PROXY_ADDRESS")
    proxy_port = os.getenv("PROXY_PROXY_PORT")
    proxy_config = ProxyConfig(server=f"https://{proxy_address}:{proxy_port}",username=username,password=password)
    browser_config = BrowserConfig(
        headless=True,
        proxy_config=proxy_config
    )

If I try to pass the username/password in the url itself - it still fails - but at least the above method works for me in version 0.6.3

rubinsh avatar May 15 '25 11:05 rubinsh

Any update for this issue. I'm having similar issues with 0.6.3 I'm using Oxylabs proxy but havn't managed to get it working with the above workarround.

When trying to use residential proxy setting, the bellow setting working as expected when sending a normal http request in Python or in curl:

    proxy_config = ProxyConfig(
        server=f"https://{host}:{sticky_port}",
        username=f"customer-{proxy_username}",
        password=proxy_password
    )

Image

But when using in Crawl4AI 0.6.3 I'm getting the following error when just trying to get to google.com for example: Getting the following exception: "net::ERR_NO_SUPPORTED_PROXIES"

[ERROR]... Γ— https://www.google.com | Error: Unexpected error in _crawl_web at line 744 in _crawl_web (.venv\lib\site-packages\crawl4ai\async_crawler_strategy.py): Error: Failed on navigating ACS-GOTO: Page.goto: net::ERR_NO_SUPPORTED_PROXIES at https://www.google.com/ Call log:

  • navigating to "https://www.google.com/", waiting until "domcontentloaded"

Code context: 739 response = await page.goto( 740 url, wait_until=config.wait_until, timeout=config.page_timeout 741 ) 742 redirected_url = page.url 743 except Error as e: 744 β†’ raise RuntimeError(f"Failed on navigating ACS-GOTO:\n{str(e)}") 745 746 await self.execute_hook( 747 "after_goto", page, context=context, url=url, response=response, config=config 748 ) 749

Any one managed to get the proxy working with Oxylabs? or any other proxy provider?

avrum avatar Jun 01 '25 23:06 avrum

I've been trying to configure the proxy for days but unfortunately without success, I've tried countless services and none of them worked, what I'm trying now is to use the rotating proxy which seems to work in parts.

ThyagoSCoelho avatar Jul 07 '25 14:07 ThyagoSCoelho

I got it working by upgrading crawl4ai to the latest version (0.7.0) pip install --upgrade crawl4ai

Then instead of hardcoding all the credentials I passed it as environment variables

    username = os.getenv("PROXY_USERNAME")
    password = os.getenv("PROXY_PASSWORD")
    proxy_address = os.getenv("PROXY_PROXY_ADDRESS")
    proxy_port = os.getenv("PROXY_PROXY_PORT")
    
    print(f" Loaded proxy config:")
    print(f"  Username: {username}")
    print(f"  Address: {proxy_address}")
    print(f"  Port: {proxy_port}")

I then passed it as a ProxyConfig instance

  proxy_config = ProxyConfig(
        server=f"http://{proxy_address}:{proxy_port}",
        username=username,
        password=password
    )

My full code is as follows: ########################################################################

  import asyncio
  from crawl4ai import AsyncWebCrawler
  from crawl4ai import ProxyConfig
  from crawl4ai.async_configs import BrowserConfig, CrawlerRunConfig
  from dotenv import load_dotenv
  import os
  
  load_dotenv('proxy_config.env')
  
  async def main():
      username = os.getenv("PROXY_USERNAME")
      password = os.getenv("PROXY_PASSWORD")
      proxy_address = os.getenv("PROXY_PROXY_ADDRESS")
      proxy_port = os.getenv("PROXY_PROXY_PORT")
      
      print(f" Loaded proxy config:")
      print(f"  Username: {username}")
      print(f"  Address: {proxy_address}")
      print(f"  Port: {proxy_port}")
      
  
      proxy_config = ProxyConfig(
          server=f"http://{proxy_address}:{proxy_port}",
          username=username,
          password=password
      )
      
      browser_config = BrowserConfig(
          headless=True,
          proxy_config=proxy_config,
          verbose=True
      )
      
      run_config = CrawlerRunConfig(
          page_timeout=60000
      )
  
      target_url = "https://webscraper.io/test-sites/e-commerce/allinone/product/1301"
      print(f"\n Testing target URL: {target_url}")
  
      async with AsyncWebCrawler(config=browser_config) as crawler:
          result = await crawler.arun(
              url=target_url,
              config=run_config
          )
          
          if result.success:
              print(f" Content length: {len(result.markdown)} characters")
              
              with open('thermofisher_result.txt', 'w', encoding='utf-8') as f:
                  f.write(result.markdown)
              print("Content saved to: thermofisher_result.txt")
              
              print(f"\n Content preview:")
              print("-" * 50)
              print(result.markdown[:500] + "..." if len(result.markdown) > 500 else result.markdown)
              
          else:
              print(f" FAILED: {result.error_message}")
  
    if __name__ == "__main__":
        asyncio.run(main())
  

########################################################################

Designed-by-Akshay avatar Jul 14 '25 07:07 Designed-by-Akshay

Hi! Thanks for raising this.

Just a heads-up: the proxy parameter has been deprecated in the latest release. The recommended approach now is to use the ProxyConfig object instead. Here’s an example of how you can set it up:

proxy_config = ProxyConfig(
    server=f"http://{proxy_address}:{proxy_port}",
    username=username,
    password=password,
)

and then pass it to browser config like this::

 browser_config = BrowserConfig(
    headless=True,
    proxy_config=proxy_config,
    verbose=True
   )

I’ll go ahead and close this issue for now since this should resolve it, but please feel free to reopen or open a new issue if you run into any problems with the new configuration.

SohamKukreti avatar Sep 05 '25 15:09 SohamKukreti