crawl4ai icon indicating copy to clipboard operation
crawl4ai copied to clipboard

Basic Health Check Endpoint to verify if a URL is accessible before running a full crawl operation

Open PATAKAMURIVENKATAGANESH opened this issue 3 months ago • 1 comments

Summary

Please include a summary of the change and/or which issues are fixed.

eg: Fixes #123 (Tag GitHub issue numbers in this format, so it automatically links the issues with your PR)

List of files changed and why

eg: quickstart.py - To update the example as per new changes

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • [x] My code follows the style guidelines of this project
  • [x] I have performed a self-review of my own code
  • [x] I have commented my code, particularly in hard-to-understand areas
  • [x] I have made corresponding changes to the documentation
  • [x] I have added/updated unit tests that prove my fix is effective or that my feature works
  • [x] New and existing unit tests pass locally with my changes

Summary by CodeRabbit

  • New Features

    • Added a URL health check to the async web crawler, enabling quick accessibility checks with configurable timeouts and SSL verification. Returns status, response time, redirect info, content type, and more with clear error reporting.
  • Documentation

    • Added a user guide and examples demonstrating health checks, batch validation, redirect handling, and conditional crawling. Clarified SSL behavior and return fields.
  • Tests

    • Introduced comprehensive async tests covering success, redirects, 4xx errors, DNS/connection failures, timeouts, and result structure validation.

PATAKAMURIVENKATAGANESH avatar Aug 17 '25 16:08 PATAKAMURIVENKATAGANESH