[Bug]: maximum recursion depth exceeded while calling a Python object
crawl4ai version
0.5.0.post4
Expected Behavior
I want to create a global variable for the browser instance, and then each time I call the crawl_url_content method to open a new TAB, I call the crawl_url_content function to get the text of the web page. Could you give me some guidance on how to modify it
Current Behavior
Hello, the following error occurred when crawling 3000 urls using the following code:maximum recursion depth exceeded while calling a Python object The code used is changed from issue399: https://github.com/unclecode/crawl4ai/issues/399
async def crawl_url_content(url):
res = ''
logger.info("\n=== Single URL Crawling with Browser Reuse + Memory Check ===")
# We'll keep track of peak memory usage
peak_memory = 0
process = psutil.Process(os.getpid())
def log_memory(prefix: str = ""):
nonlocal peak_memory
current_mem = process.memory_info().rss # in bytes
if current_mem > peak_memory:
peak_memory = current_mem
logger.info(f"{prefix} Current Memory: {current_mem // (1024 * 1024)} MB, Peak: {peak_memory // (1024 * 1024)} MB")
ua = UserAgent()
# Minimal browser config
browser_config = BrowserConfig(
headless=True,
verbose=False,
extra_args=["--disable-gpu", "--disable-dev-shm-usage", "--no-sandbox"],
user_agent_generator_config={"mode": "random"},
java_script_enabled=True,
user_agent=ua.random
)
crawl_config = CrawlerRunConfig(
cache_mode=CacheMode.BYPASS,
page_timeout=30000,
excluded_tags=EXCLUDE_TAGS,
check_robots_txt=True,
)
# Create the crawler instance
crawler = AsyncWebCrawler(config=browser_config)
await crawler.start()
try:
success_count = 0
fail_count = 0
# Check memory usage prior to processing
log_memory(prefix="Before processing: ")
# Process single URL
session_id = "single_url_session"
result = await crawler.arun(url=url, config=crawl_config, session_id=session_id)
# Check memory usage after processing
log_memory(prefix="After processing: ")
# Evaluate result
if isinstance(result, Exception):
logger.info(f"Error crawling {url}: {result}")
fail_count += 1
elif result.success:
success_count += 1
else:
fail_count += 1
res = result
logger.info(f"\nSummary:")
logger.info(f" - Successfully crawled: {success_count}")
logger.info(f" - Failed: {fail_count}")
finally:
logger.info("\nClosing crawler...")
await crawler.close()
# Final memory log
log_memory(prefix="Final: ")
logger.info(f"\nPeak memory usage (MB): {peak_memory // (1024 * 1024)}")
return res
I created a list of 3,000 urls and looped through this function.After 783 iterations, all subsequent urls would show maximum recursion depth exceeded while calling a Python object. Could you help me see what the problem is, please
Is this reproducible?
Yes
Inputs Causing the Bug
Steps to Reproduce
Code snippets
OS
linux ubuntu
Python version
3.11.11
Browser
No response
Browser version
No response
Error logs & Screenshots (if applicable)
No response
Same here, on Google Colab
My environment Max depth is set to 3000, After adjusting to 10,000, the problem reappeared in about 2,100 runs Seems to be caused by this library:colorama:RecursionError: maximum recursion depth exceeded
@yumingmin88 Thanks for reporting this. I'll check this out.
@aravindkarnam just to add on to the issue, I have rolled down to v0.4.248 and I am facing the same error. (I am triggering from an AI Agent)
@aravindkarnam which version should I downgrade to in order to make it work?
Mine got closed. Copy pasting issue content over here:
Expected Behavior
Colorama init should only be called once and ideally should be something I can opt out of when using this library. If I am already a colorama user, I wouldn't want a double-init. And if I don't want a library to touch my standard streams -- (I don't want mine touched) -- I should be able to opt out.
Current Behavior
https://github.com/tartley/colorama/blob/136808718af8b9583cb2eed1756ed6972eda4975/colorama/initialise.py#L37
Colorama init hijacks & wraps your entire python process standard streams. Doing this multiple times adds multiple layers to all log processing & things get slow.
Every AsyncLogger right now calls this init method as well as a few other spots: https://github.com/search?q=repo%3Aunclecode%2Fcrawl4ai+init%28%29&type=code
If you spin up enough async crawlers over the lifetime of your project, you'll see it slow down significantly.
We have now moved to Rich(from colorama) in 0.6.3 and could verify this issue is no longer the problem. Therefore closing the issue.
We have now moved to Rich(from colorama) in 0.6.3 and could verify this issue is no longer the problem. Therefore closing the issue.
Thank you for your contributions.
感谢你们的更新
Why when I use the provided Restful API /pdf, when the crawled web page reaches a certain number, it always appears: maximum recursion depth exceeded
crawl4ai使用的版本是0.6.3