crawl4ai
crawl4ai copied to clipboard
ππ€ Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper. Don't be shy, join here: https://discord.gg/jP8KfhDhyN
Created comprehensive crawler scripts for batdongsan.com.vn to extract Vietnamese real estate listings using crawl4ai. Features: - Full-featured crawler class with pagination support - Simple script for quick usage and customization...
### crawl4ai version N/a ### Expected Behavior The documentation should detail the ability to use cdp_url to connect to remote browser instances ### Current Behavior There is no mention of...
### crawl4ai version 0.7.4 ### Expected Behavior I noticed that the documentation regarding [arun](https://docs.crawl4ai.com/api/async-webcrawler/#22-manual-start-close) and [arun_many](https://docs.crawl4ai.com/api/arun_many/) suggests the return type be CrawlResult and nion[List[CrawlResult], AsyncGenerator[CrawlResult, None]] respectively. Which is incorrect,...
### crawl4ai version 0.7.4 ### Expected Behavior Suppose I want to crawl a website using `AsyncHTTPCrawlerStrategy` and pass the proxy configuration. It should start crawling the website by using the...
I introduces Firecrawl as an optional backend for crawl4ai. **Updates** - Added FirecrawlBackend wrapper around Firecrawlβs SDK. - Extended CLI with --backend option (default | firecrawl). - Enabled output in...
## Summary There is an error in the docstring of AsyncWebCrawler.arun: the parameter is called `config`, not `crawler_config`. ## List of files changed and why crawl4ai/async_webcrawler.py - see summary ##...
### crawl4ai version 0.7.4 ### Expected Behavior I expect to be able to discover all the urls from https://www.fastighetsvarlden.se with the url seeding. ### Current Behavior An error occurs during...
## π€ Installing Claude Code GitHub App This PR adds a GitHub Actions workflow that enables Claude Code integration in our repository. ### What is Claude Code? [Claude Code](https://claude.com/claude-code) is...
### crawl4ai version 0.7.4 ### Expected Behavior Hello, I am having an issue using target_elements to only save certain content to markdown while still allowing it to view all links...
### crawl4ai version 1.7.4 ### Expected Behavior If a webpage s build like: http://www.example.com/whatever/you/want/9123/ crawl4AI makes: http://www.example.com/whatever/you/want/9123 which leads to a 404. I monkey patched as a workaround (very dirty...