crawl4ai
crawl4ai copied to clipboard
[Bug]: 'async for' requires an object with __aiter__ method, got CrawlResultContainer
crawl4ai version
0.6.3
Expected Behavior
The asynchronous generator works fine
Current Behavior
Hello, when I started a web service and ran the deep search function multiple times, I ran it to the same url after a while (the url had previously been crawling to subpages normally) and this error occurred: 'async for' requires an object with aiter method, got CrawlResultContainer. I checked ahead of time: the CrawlResultContainer returns a regular list, not an asynchronous generator. But why is it that during multiple executions the asynchronous generator is returned and the subpages can be searched properly? Restarting the service after this problem does not cause the problem until it has been running for some time. Is the browser failing to close properly causing a resource leak? This is still the case in 0.7.4.
from fastapi import FastAPI
from crawl4ai import AsyncWebCrawler, BrowserConfig, CacheMode
from crawl4ai import CrawlerRunConfig
from crawl4ai.content_scraping_strategy import LXMLWebScrapingStrategy
from crawl4ai.deep_crawling import BFSDeepCrawlStrategy
app = FastAPI()
async def get_crawler():
browser_conf = BrowserConfig(
browser_type='chromium',
headless=True,
verbose=True,
user_agent_generator_config={"mode": "random"},
extra_args=["--disable-gpu", "--disable-dev-shm-usage", "--no-sandbox"],
)
crawler = AsyncWebCrawler(config=browser_conf)
await crawler.start()
return crawler
async def close_crawler(craler):
await craler.close()
async def bfs_crawl(url):
config = CrawlerRunConfig(
deep_crawl_strategy=BFSDeepCrawlStrategy(
max_depth=5,
include_external=False,
max_pages=3000
),
scraping_strategy=LXMLWebScrapingStrategy(),
stream=True,
verbose=True,
cache_mode=CacheMode.BYPASS,
page_timeout=50000,
# page_timeout=5000,
# excluded_tags=EXCLUDE_TAGS,
check_robots_txt=True,
)
crawler = await get_crawler()
results = []
try:
async for result in await crawler.arun(url, config=config):
results.append(result)
print("bfs crawl finished")
except Exception as e:
print(f"bfs crawl error:{e}")
finally:
await close_crawler(crawler)
from pydantic import BaseModel
class CrawlRequest(BaseModel):
url: str
@app.post("/bfs_crawl")
async def crawl(req: CrawlRequest):
await bfs_crawl(req.url)
if __name__ == "__main__":
import uvicorn
uvicorn.run(app="tt:app", host='0.0.0.0', port=8813, workers=3)
Is this reproducible?
Yes
Inputs Causing the Bug
Steps to Reproduce
Code snippets
OS
Linux
Python version
3.11.11
Browser
No response
Browser version
No response
Error logs & Screenshots (if applicable)
'async for' requires an object with aiter method, got CrawlResultContainer