crawl4ai icon indicating copy to clipboard operation
crawl4ai copied to clipboard

[Bug]: DFS Deep crawling only crawling the 1st link

Open margauxallee opened this issue 7 months ago • 0 comments

crawl4ai version

0.6.2

Expected Behavior

Crawling the internal links until the depth specified, with a maximum of pages (specified as well), in DFS mode.

Current Behavior

Only crawling one page : the first link.

Is this reproducible?

Yes

Inputs Causing the Bug


Steps to Reproduce

Run any deep crawling code (DFS strategy) expecting more than one page scraped (max_pages > 1). The issue is only with DFS, not with BFS. The deep crawling worked well on v0.5.0 (no issue).

Code snippets

import asyncio
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, CacheMode
from crawl4ai.content_scraping_strategy import LXMLWebScrapingStrategy
from crawl4ai.deep_crawling import DFSDeepCrawlStrategy


async def crawler(
    url: str = "https://www.coca-colacompany.com/"
):

    dfs_config = CrawlerRunConfig(
        deep_crawl_strategy=DFSDeepCrawlStrategy(
            max_depth=2,
            include_external=False, 
            max_pages=20
        ),
        scraping_strategy=LXMLWebScrapingStrategy(),
        verbose=True,
        cache_mode=CacheMode.BYPASS,
    )

    print(f"\n===== CRAWLING ...=====")

    async with AsyncWebCrawler() as crawler:
           
        results = await crawler.arun(url=url, config=dfs_config)

        print(f" Crawled {len(results)} pages")
        for result in results:
            depth = result.metadata.get("depth", 0)
            print(f"  → Depth: {depth} | {result.url}")



if __name__ == "__main__":
    asyncio.run(crawler())

OS

macOS

Python version

3.13.2

Browser

No response

Browser version

No response

Error logs & Screenshots (if applicable)

===== CRAWLING ...===== [INIT].... → Crawl4AI 0.6.2 [FETCH]... ↓ https://www.coca-colacompany.com/ | ✓ | ⏱: 2.42s [SCRAPE].. ◆ https://www.coca-colacompany.com/ | ✓ | ⏱: 0.02s [COMPLETE] ● https://www.coca-colacompany.com/ | ✓ | ⏱: 2.44s Crawled 1 pages → Depth: 0 | https://www.coca-colacompany.com/

margauxallee avatar May 04 '25 12:05 margauxallee