crawl4ai
crawl4ai copied to clipboard
Bring Docker API server to feature parity with Crawl4AI v0.7.x
Description
The Docker API server lags behind the Python library. This issue tracks adding endpoints/parameters to expose the following library features:
1. Adaptive crawling
- AdaptiveCrawler, AdaptiveConfig, CrawlState, CrawlStrategy, StatisticalStrategy
- Missing: endpoints to run/tune adaptive crawls
2. C4A Script language
- c4a_compile, c4a_validate, c4a_compile_file, CompilationResult, ValidationResult, ErrorDetail
- Missing: submit/validate/execute script endpoints
3. URL seeding
- AsyncUrlSeeder, SeedingConfig
- Missing: sitemap/common-crawl/discovery endpoints
4. Chunking
- ChunkingStrategy, RegexChunking
- Missing: chunking configuration
5. Browser adapters
- BrowserAdapter, PlaywrightAdapter, UndetectedAdapter
- Missing: adapter/stealth selection
6. Proxy rotation
- ProxyRotationStrategy, RoundRobinProxyStrategy
- Missing: rotation strategy selection (beyond raw proxy)
7. Dispatchers
- SemaphoreDispatcher, BaseDispatcher
- Missing: dispatcher selection (only MemoryAdaptive used internally)
8. Link preview
- LinkPreview, LinkPreviewConfig
- Missing: link preview/scoring endpoint
9. Profiling/monitoring
- BrowserProfiler, CrawlerMonitor
- Missing: profiling/monitoring endpoints
10. HTTP-only crawling
- HTTPCrawlerConfig
- Missing: HTTP crawler methods/params (non-browser). API uses browser-based crawling with LXMLWebScrapingStrategy
11. Virtual scroll
- VirtualScrollConfig
- Missing: infinite-scroll capture configuration
12. Undetected/stealth browser
- UndetectedAdapter; browser_config/browser_type='undetected'; stealth options
- Missing: explicit stealth mode controls
Acceptance criteria
1. New/extended endpoints and/or request schemas added
- New endpoints: Add missing API routes (e.g.,
/adaptive/crawl,/deep-crawl,/c4a-script/compile,/hub/crawlers) - Extended schemas: Enhance existing endpoints to accept new parameters (e.g., add
virtual_scroll_configto/crawl, addtable_extraction_strategyoptions) - Request schemas: Update
schemas.pyto include new request models for the missing features
2. Docs and examples updated
- API documentation: Update the docs to show new endpoints and parameters
- Parameter documentation: Add descriptions, examples, and validation rules for new fields
- Examples: Add working code examples showing how to use each new feature.
3. Minimal e2e tests per feature group
- Test coverage: Create integration tests that verify each new feature works end-to-end
- Happy path: Test successful usage of each feature
- Validation: Test error handling (invalid parameters, edge cases, etc.)
- Feature groups: Organize tests by category (adaptive crawling, deep crawling, C4A scripts, etc.)