crawl4ai icon indicating copy to clipboard operation
crawl4ai copied to clipboard

Bring Docker API server to feature parity with Crawl4AI v0.7.x

Open SohamKukreti opened this issue 3 months ago • 2 comments

Description

The Docker API server lags behind the Python library. This issue tracks adding endpoints/parameters to expose the following library features:

1. Adaptive crawling

  • AdaptiveCrawler, AdaptiveConfig, CrawlState, CrawlStrategy, StatisticalStrategy
  • Missing: endpoints to run/tune adaptive crawls

2. C4A Script language

  • c4a_compile, c4a_validate, c4a_compile_file, CompilationResult, ValidationResult, ErrorDetail
  • Missing: submit/validate/execute script endpoints

3. URL seeding

  • AsyncUrlSeeder, SeedingConfig
  • Missing: sitemap/common-crawl/discovery endpoints

4. Chunking

  • ChunkingStrategy, RegexChunking
  • Missing: chunking configuration

5. Browser adapters

  • BrowserAdapter, PlaywrightAdapter, UndetectedAdapter
  • Missing: adapter/stealth selection

6. Proxy rotation

  • ProxyRotationStrategy, RoundRobinProxyStrategy
  • Missing: rotation strategy selection (beyond raw proxy)

7. Dispatchers

  • SemaphoreDispatcher, BaseDispatcher
  • Missing: dispatcher selection (only MemoryAdaptive used internally)

8. Link preview

  • LinkPreview, LinkPreviewConfig
  • Missing: link preview/scoring endpoint

9. Profiling/monitoring

  • BrowserProfiler, CrawlerMonitor
  • Missing: profiling/monitoring endpoints

10. HTTP-only crawling

  • HTTPCrawlerConfig
  • Missing: HTTP crawler methods/params (non-browser). API uses browser-based crawling with LXMLWebScrapingStrategy

11. Virtual scroll

  • VirtualScrollConfig
  • Missing: infinite-scroll capture configuration

12. Undetected/stealth browser

  • UndetectedAdapter; browser_config/browser_type='undetected'; stealth options
  • Missing: explicit stealth mode controls

Acceptance criteria

1. New/extended endpoints and/or request schemas added

  • New endpoints: Add missing API routes (e.g., /adaptive/crawl, /deep-crawl, /c4a-script/compile, /hub/crawlers)
  • Extended schemas: Enhance existing endpoints to accept new parameters (e.g., add virtual_scroll_config to /crawl, add table_extraction_strategy options)
  • Request schemas: Update schemas.py to include new request models for the missing features

2. Docs and examples updated

  • API documentation: Update the docs to show new endpoints and parameters
  • Parameter documentation: Add descriptions, examples, and validation rules for new fields
  • Examples: Add working code examples showing how to use each new feature.

3. Minimal e2e tests per feature group

  • Test coverage: Create integration tests that verify each new feature works end-to-end
  • Happy path: Test successful usage of each feature
  • Validation: Test error handling (invalid parameters, edge cases, etc.)
  • Feature groups: Organize tests by category (adaptive crawling, deep crawling, C4A scripts, etc.)

SohamKukreti avatar Aug 28 '25 16:08 SohamKukreti