crawl4ai
crawl4ai copied to clipboard
Basic Health Check Endpoint to verify if a URL is accessible before running a full crawl operation
Summary
Please include a summary of the change and/or which issues are fixed.
eg: Fixes #123 (Tag GitHub issue numbers in this format, so it automatically links the issues with your PR)
List of files changed and why
eg: quickstart.py - To update the example as per new changes
How Has This Been Tested?
Please describe the tests that you ran to verify your changes.
Checklist:
- [x] My code follows the style guidelines of this project
- [x] I have performed a self-review of my own code
- [x] I have commented my code, particularly in hard-to-understand areas
- [x] I have made corresponding changes to the documentation
- [x] I have added/updated unit tests that prove my fix is effective or that my feature works
- [x] New and existing unit tests pass locally with my changes
Summary by CodeRabbit
-
New Features
- Added a URL health check to the async web crawler, enabling quick accessibility checks with configurable timeouts and SSL verification. Returns status, response time, redirect info, content type, and more with clear error reporting.
-
Documentation
- Added a user guide and examples demonstrating health checks, batch validation, redirect handling, and conditional crawling. Clarified SSL behavior and return fields.
-
Tests
- Introduced comprehensive async tests covering success, redirects, 4xx errors, DNS/connection failures, timeouts, and result structure validation.