crawl4ai
crawl4ai copied to clipboard
[Bug]: DFS Deep crawling only crawling the 1st link
crawl4ai version
0.6.2
Expected Behavior
Crawling the internal links until the depth specified, with a maximum of pages (specified as well), in DFS mode.
Current Behavior
Only crawling one page : the first link.
Is this reproducible?
Yes
Inputs Causing the Bug
Steps to Reproduce
Run any deep crawling code (DFS strategy) expecting more than one page scraped (max_pages > 1). The issue is only with DFS, not with BFS. The deep crawling worked well on v0.5.0 (no issue).
Code snippets
import asyncio
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, CacheMode
from crawl4ai.content_scraping_strategy import LXMLWebScrapingStrategy
from crawl4ai.deep_crawling import DFSDeepCrawlStrategy
async def crawler(
url: str = "https://www.coca-colacompany.com/"
):
dfs_config = CrawlerRunConfig(
deep_crawl_strategy=DFSDeepCrawlStrategy(
max_depth=2,
include_external=False,
max_pages=20
),
scraping_strategy=LXMLWebScrapingStrategy(),
verbose=True,
cache_mode=CacheMode.BYPASS,
)
print(f"\n===== CRAWLING ...=====")
async with AsyncWebCrawler() as crawler:
results = await crawler.arun(url=url, config=dfs_config)
print(f" Crawled {len(results)} pages")
for result in results:
depth = result.metadata.get("depth", 0)
print(f" → Depth: {depth} | {result.url}")
if __name__ == "__main__":
asyncio.run(crawler())
OS
macOS
Python version
3.13.2
Browser
No response
Browser version
No response
Error logs & Screenshots (if applicable)
===== CRAWLING ...===== [INIT].... → Crawl4AI 0.6.2 [FETCH]... ↓ https://www.coca-colacompany.com/ | ✓ | ⏱: 2.42s [SCRAPE].. ◆ https://www.coca-colacompany.com/ | ✓ | ⏱: 0.02s [COMPLETE] ● https://www.coca-colacompany.com/ | ✓ | ⏱: 2.44s Crawled 1 pages → Depth: 0 | https://www.coca-colacompany.com/