crawl4ai [Bug]: `arun_many` doesn't parallelize tasks when using `raw://`

crawl4ai version

0.4.3b2 (also on 0.4.3b3)

Expected Behavior

Using arun_many with raw URLs should parallelize when max_session_permit>1.

This is based on the features demo in docs/examples/v0_4_3b2_features_demo.py

Current Behavior

Using arun_many doesn't parallelize tasks when using raw:// URLs, but works correctly with regular HTTP URLs. When using raw HTML content, only one task is active at a time, ignoring the max_session_permit setting.

Using raw HTMLs:

Using regular HTTP URLs:

Is this reproducible?

Yes

Inputs Causing the Bug

No response

Steps to Reproduce

This is based on the demo_memory_dispatcher method in the demo in https://github.com/unclecode/crawl4ai/blob/d0586f09a946e8e70e34e7e3b670ca165c7d71ec/docs/examples/v0_4_3b2_features_demo.py

Using pixi... set up the directory as follows:

❯ tree .
.
├── main.py
└── pixi.toml

# pixi.toml
[project]
channels = ["conda-forge"]
description = "Add a short description here"
name = "issue-crawl4ai"
platforms = ["osx-arm64"]
version = "0.1.0"

[tasks]
postinstall = "pip install Crawl4AI==0.4.3b2 && crawl4ai-setup && crawl4ai-doctor"

[dependencies]
python = "3.11.0"
pip = "*"

and the script:

# main.py
import asyncio

from crawl4ai import (
    AsyncWebCrawler,
    BrowserConfig,
    CacheMode,
    CrawlerMonitor,
    CrawlerRunConfig,
    DefaultMarkdownGenerator,
    DisplayMode,
    MemoryAdaptiveDispatcher,
)


async def demo_memory_dispatcher(use_raw: bool) -> None:
    print("\n=== Memory Dispatcher Demo ===")

    try:
        # Configuration
        browser_config = BrowserConfig(headless=True, verbose=False)
        crawler_config = CrawlerRunConfig(
            cache_mode=CacheMode.BYPASS, markdown_generator=DefaultMarkdownGenerator()
        )

        # Test URLs
        if not use_raw:
            urls = [
                "http://example.com",
                "http://example.org",
                "http://example.net",
            ] * 50
        else:
            dummy_html = """
            <html>
            <body>
                <div class='crypto-row'>
                <h2 class='coin-name'>Bitcoin</h2>
                <span class='coin-price'>$28,000</span>
                </div>
                <div class='crypto-row'>
                <h2 class='coin-name'>Ethereum</h2>
                <span class='coin-price'>$1,800</span>
                </div>
            </body>
            </html>
            """
            urls = [f"raw://{dummy_html}"] * 1000

        print("\n📈 Initializing crawler with memory monitoring...")
        async with AsyncWebCrawler(config=browser_config) as crawler:
            monitor = CrawlerMonitor(
                max_visible_rows=10, display_mode=DisplayMode.DETAILED
            )

            dispatcher = MemoryAdaptiveDispatcher(
                memory_threshold_percent=80.0,
                check_interval=0.5,
                max_session_permit=5,
                monitor=monitor,
            )

            print("\n🚀 Starting batch crawl...")
            results = await crawler.arun_many(
                urls=urls, config=crawler_config, dispatcher=dispatcher
            )
            print(f"\n✅ Completed {len(results)} URLs successfully")

    except Exception as e:
        print(f"\n❌ Error in memory dispatcher demo: {str(e)}")


async def main():
    """Run all feature demonstrations."""
    print("\n📊 Running Crawl4ai v0.4.3 Feature Demos\n")

    # Efficiency & Speed Demos
    print("This shows that there are 5 active tasks at the same time")
    await demo_memory_dispatcher(use_raw=False)

    print("This is not working, it shows that there are only 1 active task at a time")
    await demo_memory_dispatcher(use_raw=True)


if __name__ == "__main__":
    asyncio.run(main())

Code snippets

Basically run the main.py script above:

pixi install
pixi run postinstall
pixi run python main.py

OS

macOS

Python version

3.11.0

Browser

No response

Browser version

No response

Error logs & Screenshots (if applicable)

No response

Jan 25 '25 16:01 mohahf19

@aravindkarnam This is an odd case, have to check this by myself.

Jan 28 '25 15:01 unclecode

I am noticing the same behavior for crawls when using file://. Has a fix been implemented for this?

Jul 15 '25 18:07 prachipatil-ds

@prachipatil-ds Not yet, but we picked it up in the current sprint. Hopefully a fix will be done in next couple of weeks!!

Aug 05 '25 09:08 aravindkarnam

This issue has been resolved in the develop branch. I would appreciate it if you all could help test and check it out.

Aug 12 '25 08:08 ntohidi

This issue has been resolved in the develop branch. I would appreciate it if you all could help test and check it out.

just tested it on version 0.7.4 and it seems to parallelize properly. thanks!

Aug 17 '25 13:08 mohahf19

already merged with the main branch and the latest version (0.7.4)

Aug 18 '25 03:08 ntohidi