crawl4ai
crawl4ai copied to clipboard
[Bug]: Not able to use deep_crawl_strategies while using arun_many
crawl4ai version
0.7.4
Expected Behavior
It was supposed to deep crawl all the urls provided while doing arun_many.
Current Behavior
Instead of crawling with deep crawl, it is giving failed to crawl.
Is this reproducible?
Yes
Inputs Causing the Bug
import asyncio
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, CacheMode
from crawl4ai.deep_crawling import BFSDeepCrawlStrategy
from crawl4ai.content_scraping_strategy import LXMLWebScrapingStrategy
from crawl4ai.markdown_generation_strategy import DefaultMarkdownGenerator
async def main():
config = CrawlerRunConfig(
deep_crawl_strategy=BFSDeepCrawlStrategy(
max_depth=20,
max_pages=5000,
include_external=False, # stay inside site
),
cache_mode=CacheMode.ENABLED,
exclude_external_links=True,
exclude_social_media_links=True,
stream=True, # important for streaming results
page_timeout=240000,
scraping_strategy=LXMLWebScrapingStrategy(),
wait_until="domcontentloaded",
semaphore_count=3,
markdown_generator=DefaultMarkdownGenerator(content_source="raw_html")
)
async with AsyncWebCrawler() as crawler:
# Step 1: await arun_many to get async iterator
result_iterator = await crawler.arun_many(
urls=["https://leclairfoundation.org", "https://jottful.com"],
config=config
)
# Step 2: iterate over results as they complete
async for result in result_iterator:
if result.success:
print(f"Just completed: {result.url}")
# process_result(result)
else:
print(f"Failed: {result.url} – {result.error}")
asyncio.run(main())
Steps to Reproduce
Code snippets
OS
macOS
Python version
3.12
Browser
Chrome
Browser version
No response
Error logs & Screenshots (if applicable)
python3 scrapping_crawl4ai_current_solution.py [INIT].... → Crawl4AI 0.7.4 Failed: https://jottful.com Failed: https://leclairfoundation.org