crawl4ai
crawl4ai copied to clipboard
🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper. Don't be shy, join here: https://discord.gg/jP8KfhDhyN
### crawl4ai version 0.5.0 ### Expected Behavior All 680 product URLs passed to crawler.arun_many() should produce a corresponding result.extracted_content if the crawling and extraction process succeeds. Each result should be...
### crawl4ai version 0.7.7 ### Expected Behavior Should parse the webpage correctly. ### Current Behavior When crawling this page: https://www.toshiba-lifestyle.com/th-en/blog/how-to-choose-the-right-laundry-product-for-you I get the following error: ``` [ERROR]... × https://www.toshiba-lif...laundry-product-for-you |...
### crawl4ai version latest ### Expected Behavior I successfully build a crawler request with BrowserConfig and CrawlerConfig with CSSExtraction, etc. Now I want to build a webhook strategy to not...
## Summary Summary Ensures `BrowserConfig.to_dict()` emits JSON-safe data by converting nested ProxyConfig objects into dictionaries. Prevents `TypeError: Object of type ProxyConfig is not JSON serializable` in environments (like Docker) that...
### crawl4ai version 0.7.7 ### Expected Behavior When using `proxy_config` with `BrowserConfig`, the configuration should be serializable to JSON for use with the Docker API server's crawler pool. The `BrowserConfig.to_dict()`...
## Summary When scraping many URLs continuously, browser contexts accumulate in memory and are never cleaned up. The existing cleanup mechanism only runs when browsers go idle, which never happens...
### crawl4ai version 0.7.6 ### Expected Behavior When i set mean_delay, it should be delayed between requests ### Current Behavior It ignores mean_delay config ### Is this reproducible? Yes ###...
## Summary Updates lxml dependency to 6.0. ## List of files changed and why To upgrade the lxml constraint and regenerate the lock file, these files were touched: pyproject.toml uv.lock...
hello, trying to use the link scoring feature with the following config but im getting the error below. crawl4ai running in docker. any idea what is wrong? ``` [LINK_EXTRACT] ℹ...
### crawl4ai version 0.6.3 ### Expected Behavior my example crawler: ``` llm_strategy = LLMExtractionStrategy( llm_config=self.llm_config, schema=PdfDoc.model_json_schema(), extraction_type="schema", instruction=""" From the crawled content, extract data from html - data in html...