crawl4ai icon indicating copy to clipboard operation
crawl4ai copied to clipboard

No way to set crawl depth for crawler

Open matijaparavac opened this issue 1 year ago • 1 comments
trafficstars

Currently crawler only crawls links of depth level 1. That means if you give homepage link (homepage.com) it will only crawl direct links from that homepage and it will not crawl links that are located in homepage.com/news/sports-data ---> if there is for example "more info" link located here it won't be crawled.

matijaparavac avatar Sep 27 '24 20:09 matijaparavac

@matijaparavac We're building our scraper engine, which will soon be available in the Crawl4ai library. We started by focusing on a robust, fast, and asynchronous approach to crawl a single page effectively. This was part of our roadmap—to ensure we could properly generate data, handle various situations, execute JavaScript, and navigate all the nuances of crawling a page. Now, we’re developing the scraper itself, which features a customizable graph search algorithm with various parameters.

Right now, you can simulate crawling by fetching one page and, from the results, get a list of all internal and external links. You can then use a task queue to crawl as many of those links as you like. That’s one approach you can take for now, but we'll be releasing the full scraper engine soon!

unclecode avatar Sep 28 '24 00:09 unclecode

Great to hear this! Thanks!

matijaparavac avatar Sep 29 '24 12:09 matijaparavac