crawl4ai icon indicating copy to clipboard operation
crawl4ai copied to clipboard

Example on whole-blog crawling?

Open BradKML opened this issue 4 months ago • 17 comments

Thanks for creating alternatives to FireCrawl for LLMs! Here is a bit of a question: are there examples or shortcuts for crawling a whole blog (may not may not have things like CloudFlare)?

  1. How would the speed of crawling be managed such that the crawler won't be blocked?
  2. Could the out-links to other articles be also captured (just in case for context)?
  3. How can individual articles be separated from paginated web indexes?
  4. Are there ways to hack infinite scrolling for blogs without proper sitemap?

BradKML avatar Sep 30 '24 03:09 BradKML