crawl4ai
crawl4ai copied to clipboard
No way to set crawl depth for crawler
Currently crawler only crawls links of depth level 1. That means if you give homepage link (homepage.com) it will only crawl direct links from that homepage and it will not crawl links that are located in homepage.com/news/sports-data ---> if there is for example "more info" link located here it won't be crawled.
@matijaparavac We're building our scraper engine, which will soon be available in the Crawl4ai library. We started by focusing on a robust, fast, and asynchronous approach to crawl a single page effectively. This was part of our roadmap—to ensure we could properly generate data, handle various situations, execute JavaScript, and navigate all the nuances of crawling a page. Now, we’re developing the scraper itself, which features a customizable graph search algorithm with various parameters.
Right now, you can simulate crawling by fetching one page and, from the results, get a list of all internal and external links. You can then use a task queue to crawl as many of those links as you like. That’s one approach you can take for now, but we'll be releasing the full scraper engine soon!
Great to hear this! Thanks!