crawl4ai
crawl4ai copied to clipboard
🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper. Don't be shy, join here: https://discord.gg/jP8KfhDhyN
## Summary This small PR resolves the `datetime` library warnings: ```python DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in...
### crawl4ai version 0.5.0post8 ### Expected Behavior Crawl4ai not crashing because of memory leaks and StreamingResponse not returning a empty byte string (b'') ### Current Behavior When using Crawl4ai for...
### crawl4ai version 0.5.0.post8 ### Expected Behavior If rate limit is hit the user should be informed ### Current Behavior When the rate limit exceeds the retries `perform_completion_with_backoff` returns a...
### crawl4ai version v0.5.x ### Expected Behavior I am looping over 140 webpages that I want to crawl. That works fine for the first couple or so. ### Current Behavior...
## Summary Please include a summary of the change and/or which issues are fixed. eg: `Fixes #123` (Tag GitHub issue numbers in this format, so it automatically links the issues...
## Summary Fixes outdated links in the `README.md` file to point to the correct and current documentation pages. Additionally, the pages under `https://docs.crawl4ai.com/basic/` appear outdated and may need review. ##...
### crawl4ai version 0.5 ### Expected Behavior 采用 on_page_context_created 钩子事件执行页面动态行为,然后替换布局表格为table标签后,采集页面保存为markdown文件 ### Current Behavior 采用 on_page_context_created 钩子事件执行页面动态行为,然后替换布局表格为table标签后,页面自动刷新 ### Is this reproducible? Yes ### Inputs Causing the Bug ```bash https://platform.worldquantbrain.com/learn/operators ``` ###...
### crawl4ai version 6.0.0 ### Expected Behavior A crawl should successful handle a site which actively manages client request rates. ### Current Behavior The current RateLimiter implementation uses a simple...
## Summary Replace the existing rate limiting mechanism with a token bucket algorithm to improve request handling efficiency and control. The previous implementation used a simple last request and current...
Fix elpased and improper output format in docs scraping strategies performance Before fix  After fix  ## Summary by CodeRabbit - **Style** - Improved import organization and code formatting...