crawl4ai icon indicating copy to clipboard operation
crawl4ai copied to clipboard

🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper. Don't be shy, join here: https://discord.gg/jP8KfhDhyN

Results 541 crawl4ai issues
Sort by recently updated
recently updated
newest added

## Summary This small PR resolves the `datetime` library warnings: ```python DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in...

### crawl4ai version 0.5.0post8 ### Expected Behavior Crawl4ai not crashing because of memory leaks and StreamingResponse not returning a empty byte string (b'') ### Current Behavior When using Crawl4ai for...

🐞 Bug
⁇ Needs Clarification

### crawl4ai version 0.5.0.post8 ### Expected Behavior If rate limit is hit the user should be informed ### Current Behavior When the rate limit exceeds the retries `perform_completion_with_backoff` returns a...

🐞 Bug
📌 Root caused

### crawl4ai version v0.5.x ### Expected Behavior I am looping over 140 webpages that I want to crawl. That works fine for the first couple or so. ### Current Behavior...

🐞 Bug
🩺 Needs Triage

## Summary Please include a summary of the change and/or which issues are fixed. eg: `Fixes #123` (Tag GitHub issue numbers in this format, so it automatically links the issues...

## Summary Fixes outdated links in the `README.md` file to point to the correct and current documentation pages. Additionally, the pages under `https://docs.crawl4ai.com/basic/` appear outdated and may need review. ##...

### crawl4ai version 0.5 ### Expected Behavior 采用 on_page_context_created 钩子事件执行页面动态行为,然后替换布局表格为table标签后,采集页面保存为markdown文件 ### Current Behavior 采用 on_page_context_created 钩子事件执行页面动态行为,然后替换布局表格为table标签后,页面自动刷新 ### Is this reproducible? Yes ### Inputs Causing the Bug ```bash https://platform.worldquantbrain.com/learn/operators ``` ###...

🐞 Bug
🩺 Needs Triage

### crawl4ai version 6.0.0 ### Expected Behavior A crawl should successful handle a site which actively manages client request rates. ### Current Behavior The current RateLimiter implementation uses a simple...

🐞 Bug
🩺 Needs Triage

## Summary Replace the existing rate limiting mechanism with a token bucket algorithm to improve request handling efficiency and control. The previous implementation used a simple last request and current...

Fix elpased and improper output format in docs scraping strategies performance Before fix ![docs_scraping_performance_issues2](https://github.com/user-attachments/assets/4571798a-af7e-4de1-b350-7b62574efcc5) After fix ![fixed_docs_scraping_performance_issues](https://github.com/user-attachments/assets/ef78a2a1-e8f2-4bd9-a151-67e6dfc32d43) ## Summary by CodeRabbit - **Style** - Improved import organization and code formatting...