crawl4ai icon indicating copy to clipboard operation
crawl4ai copied to clipboard

[Bug]: maximum recursion depth exceeded while calling a Python object

Open yumingmin88 opened this issue 9 months ago • 6 comments

crawl4ai version

0.5.0.post4

Expected Behavior

I want to create a global variable for the browser instance, and then each time I call the crawl_url_content method to open a new TAB, I call the crawl_url_content function to get the text of the web page. Could you give me some guidance on how to modify it

Current Behavior

Hello, the following error occurred when crawling 3000 urls using the following code:maximum recursion depth exceeded while calling a Python object The code used is changed from issue399: https://github.com/unclecode/crawl4ai/issues/399

async def crawl_url_content(url):
    res = ''

    logger.info("\n=== Single URL Crawling with Browser Reuse + Memory Check ===")

    # We'll keep track of peak memory usage
    peak_memory = 0
    process = psutil.Process(os.getpid())

    def log_memory(prefix: str = ""):
        nonlocal peak_memory
        current_mem = process.memory_info().rss  # in bytes
        if current_mem > peak_memory:
            peak_memory = current_mem
        logger.info(f"{prefix} Current Memory: {current_mem // (1024 * 1024)} MB, Peak: {peak_memory // (1024 * 1024)} MB")

    ua = UserAgent()
    # Minimal browser config
    browser_config = BrowserConfig(
        headless=True,
        verbose=False,   
        extra_args=["--disable-gpu", "--disable-dev-shm-usage", "--no-sandbox"],
        user_agent_generator_config={"mode": "random"},
        java_script_enabled=True,
        user_agent=ua.random
    )
    crawl_config = CrawlerRunConfig(
        cache_mode=CacheMode.BYPASS, 
        page_timeout=30000,
        excluded_tags=EXCLUDE_TAGS,
        check_robots_txt=True,
    )

    # Create the crawler instance
    crawler = AsyncWebCrawler(config=browser_config)
    await crawler.start()

    try:
        success_count = 0
        fail_count = 0
        
        # Check memory usage prior to processing
        log_memory(prefix="Before processing: ")

        # Process single URL
        session_id = "single_url_session"
        result = await crawler.arun(url=url, config=crawl_config, session_id=session_id)

        # Check memory usage after processing
        log_memory(prefix="After processing: ")

        # Evaluate result
        if isinstance(result, Exception):
            logger.info(f"Error crawling {url}: {result}")
            fail_count += 1
        elif result.success:
            success_count += 1
        else:
            fail_count += 1
        res = result

        logger.info(f"\nSummary:")
        logger.info(f"  - Successfully crawled: {success_count}")
        logger.info(f"  - Failed: {fail_count}")

    finally:
        logger.info("\nClosing crawler...")
        await crawler.close()
        # Final memory log
        log_memory(prefix="Final: ")
        logger.info(f"\nPeak memory usage (MB): {peak_memory // (1024 * 1024)}")
    return res

I created a list of 3,000 urls and looped through this function.After 783 iterations, all subsequent urls would show maximum recursion depth exceeded while calling a Python object. Could you help me see what the problem is, please

Is this reproducible?

Yes

Inputs Causing the Bug


Steps to Reproduce


Code snippets


OS

linux ubuntu

Python version

3.11.11

Browser

No response

Browser version

No response

Error logs & Screenshots (if applicable)

No response

yumingmin88 avatar Mar 25 '25 09:03 yumingmin88

Same here, on Google Colab

seifeur avatar Mar 25 '25 23:03 seifeur

My environment Max depth is set to 3000, After adjusting to 10,000, the problem reappeared in about 2,100 runs Seems to be caused by this library:colorama:RecursionError: maximum recursion depth exceeded

yumingmin88 avatar Mar 26 '25 06:03 yumingmin88

@yumingmin88 Thanks for reporting this. I'll check this out.

aravindkarnam avatar Mar 26 '25 07:03 aravindkarnam

@aravindkarnam just to add on to the issue, I have rolled down to v0.4.248 and I am facing the same error. (I am triggering from an AI Agent)

kivous avatar Mar 27 '25 18:03 kivous

@aravindkarnam which version should I downgrade to in order to make it work?

Priyadutt178 avatar Mar 28 '25 05:03 Priyadutt178

Mine got closed. Copy pasting issue content over here:

Expected Behavior

Colorama init should only be called once and ideally should be something I can opt out of when using this library. If I am already a colorama user, I wouldn't want a double-init. And if I don't want a library to touch my standard streams -- (I don't want mine touched) -- I should be able to opt out.

Current Behavior

https://github.com/tartley/colorama/blob/136808718af8b9583cb2eed1756ed6972eda4975/colorama/initialise.py#L37

Colorama init hijacks & wraps your entire python process standard streams. Doing this multiple times adds multiple layers to all log processing & things get slow.

Every AsyncLogger right now calls this init method as well as a few other spots: https://github.com/search?q=repo%3Aunclecode%2Fcrawl4ai+init%28%29&type=code

If you spin up enough async crawlers over the lifetime of your project, you'll see it slow down significantly.

fmmoret avatar Mar 28 '25 17:03 fmmoret

We have now moved to Rich(from colorama) in 0.6.3 and could verify this issue is no longer the problem. Therefore closing the issue.

aravindkarnam avatar May 13 '25 07:05 aravindkarnam

We have now moved to Rich(from colorama) in 0.6.3 and could verify this issue is no longer the problem. Therefore closing the issue.

Thank you for your contributions.

yumingmin88 avatar May 13 '25 09:05 yumingmin88

感谢你们的更新

zhoufei0622 avatar May 27 '25 14:05 zhoufei0622

Why when I use the provided Restful API /pdf, when the crawled web page reaches a certain number, it always appears: maximum recursion depth exceeded

zhoufei0622 avatar May 28 '25 03:05 zhoufei0622

crawl4ai使用的版本是0.6.3

zhoufei0622 avatar May 28 '25 09:05 zhoufei0622