crawl4ai
crawl4ai copied to clipboard

Published 20 hours ago •

Reame
Issues

Bring Docker API server to feature parity with Crawl4AI v0.7.x

Open SohamKukreti opened this issue 3 months ago • 2 comments

Description

The Docker API server lags behind the Python library. This issue tracks adding endpoints/parameters to expose the following library features:

1. Adaptive crawling

AdaptiveCrawler, AdaptiveConfig, CrawlState, CrawlStrategy, StatisticalStrategy
Missing: endpoints to run/tune adaptive crawls

2. C4A Script language

c4a_compile, c4a_validate, c4a_compile_file, CompilationResult, ValidationResult, ErrorDetail
Missing: submit/validate/execute script endpoints

3. URL seeding

AsyncUrlSeeder, SeedingConfig
Missing: sitemap/common-crawl/discovery endpoints

4. Chunking

ChunkingStrategy, RegexChunking
Missing: chunking configuration

5. Browser adapters

BrowserAdapter, PlaywrightAdapter, UndetectedAdapter
Missing: adapter/stealth selection

6. Proxy rotation

ProxyRotationStrategy, RoundRobinProxyStrategy
Missing: rotation strategy selection (beyond raw proxy)

7. Dispatchers

SemaphoreDispatcher, BaseDispatcher
Missing: dispatcher selection (only MemoryAdaptive used internally)

8. Link preview

LinkPreview, LinkPreviewConfig
Missing: link preview/scoring endpoint

9. Profiling/monitoring

BrowserProfiler, CrawlerMonitor
Missing: profiling/monitoring endpoints

10. HTTP-only crawling

HTTPCrawlerConfig
Missing: HTTP crawler methods/params (non-browser). API uses browser-based crawling with LXMLWebScrapingStrategy

11. Virtual scroll

VirtualScrollConfig
Missing: infinite-scroll capture configuration

12. Undetected/stealth browser

UndetectedAdapter; browser_config/browser_type='undetected'; stealth options
Missing: explicit stealth mode controls

Acceptance criteria

1. New/extended endpoints and/or request schemas added

New endpoints: Add missing API routes (e.g., /adaptive/crawl, /deep-crawl, /c4a-script/compile, /hub/crawlers)
Extended schemas: Enhance existing endpoints to accept new parameters (e.g., add virtual_scroll_config to /crawl, add table_extraction_strategy options)
Request schemas: Update schemas.py to include new request models for the missing features

2. Docs and examples updated

API documentation: Update the docs to show new endpoints and parameters
Parameter documentation: Add descriptions, examples, and validation rules for new fields
Examples: Add working code examples showing how to use each new feature.

3. Minimal e2e tests per feature group

Test coverage: Create integration tests that verify each new feature works end-to-end
Happy path: Test successful usage of each feature
Validation: Test error handling (invalid parameters, edge cases, etc.)
Feature groups: Organize tests by category (adaptive crawling, deep crawling, C4A scripts, etc.)

Aug 28 '25 16:08 SohamKukreti

Labels

✨ Enhancement

⚙️ Under Review

⏰-spill-over

Owner

Other Repo Issues