crawl4ai
crawl4ai copied to clipboard
[Bug]: mean_delay does not work with CrawlerRunConfig
crawl4ai version
0.7.6
Expected Behavior
When i set mean_delay, it should be delayed between requests
Current Behavior
It ignores mean_delay config
Is this reproducible?
Yes
Inputs Causing the Bug
Steps to Reproduce
Code snippets
await crawler.arun_many(
urls=[
"https://docs.crawl4ai.com/core/examples/",
"https://docs.crawl4ai.com/core/quickstart/",
],
config=CrawlerRunConfig(
stream=False, mean_delay=10.0, max_range=5.0, semaphore_count=1
)
)
OS
macOs
Python version
3.13
Browser
Chrome
Browser version
No response
Error logs & Screenshots (if applicable)
Config:
config=CrawlerRunConfig(
stream=False, mean_delay=10.0, max_range=5.0, semaphore_count=1
)
Result:
[INIT].... → Crawl4AI 0.7.6
[FETCH]... ↓ https://docs.crawl4ai.com/core/examples/ | ✓ |
⏱: 3.63s
[SCRAPE].. ◆ https://docs.crawl4ai.com/core/examples/ | ✓ |
⏱: 0.03s
[COMPLETE] ● https://docs.crawl4ai.com/core/examples/ | ✓ |
⏱: 3.66s
[FETCH]... ↓ https://docs.crawl4ai.com/core/quickstart/ | ✓ |
⏱: 1.63s
[SCRAPE].. ◆ https://docs.crawl4ai.com/core/quickstart/ | ✓ |
⏱: 0.02s
[COMPLETE] ● https://docs.crawl4ai.com/core/quickstart/ | ✓ |
⏱: 1.65s
Elapsed time: 3.70 seconds
https://docs.crawl4ai.com/core/examples/ crawled OK!
https://docs.crawl4ai.com/core/quickstart/ crawled OK!
in arun_many:
config = config or CrawlerRunConfig()
# if config is None:
# config = CrawlerRunConfig(
# word_count_threshold=word_count_threshold,
# extraction_strategy=extraction_strategy,
# chunking_strategy=chunking_strategy,
# content_filter=content_filter,
# cache_mode=cache_mode,
# bypass_cache=bypass_cache,
# css_selector=css_selector,
# screenshot=screenshot,
# pdf=pdf,
# verbose=verbose,
# **kwargs,
# )
if dispatcher is None:
dispatcher = MemoryAdaptiveDispatcher(
rate_limiter=RateLimiter(
base_delay=(1.0, 3.0), max_delay=60.0, max_retries=3
),
)
seems like the rate limit has been moved to the dispatcher param. if you're calling arun_many directly, maybe try creating the dispatcher and passing it with the delay you want?
dispatcher = SemaphoreDispatcher(
max_session_permit=1,
rate_limiter=RateLimiter(
base_delay=(8.0, 12.0), max_delay=60.0, max_retries=3
),
)
await crawler.arun_many(
urls=[
"https://docs.crawl4ai.com/core/examples/",
"https://docs.crawl4ai.com/core/quickstart/",
],
config=CrawlerRunConfig(
stream=False
),
dispatcher=dispatcher
)
I do think it's a bit misleading with the config options still available though. They're probably just there for backwards compatibility, but they should probably be removed in the future or at least unpacked to dynamically set those dispatcher values to prevent confusion.