Fix: request `/crawl` with `stream: true` issue #1066

Open zinodynn opened this issue 7 months ago • 0 comments

Summary

Fixes #1066 This PR fixes the error that happens when you send a request to /crawl with stream: true in the crawler_config. What’s the fix?

If a request has stream: true, it now automatically redirects (307) to the /crawl/stream endpoint.

No more async_generator errors

List of files changed and why

deploy/docker/server.py

Added a check for stream: true in the /crawl request handler.
If detected, redirects to /crawl/stream instead of trying to process it in the wrong place.

deploy/docker/static/playground/index.html

update stream configuration based on endpoint selection (/crawl :stream is False, /crawl/stream : stream is True )

tests/docker/test_server_requests.py

add test for crawl endpoint with stream redirects

How Has This Been Tested?

Tested with curl(like the issue example) eg.

curl --location 'http://localhost:11235/crawl' \
--header 'Content-Type: application/json' \
--data '{
    "urls": [
        "https://example.com/page1",  
        "https://example.com/page2",
    ],
    "crawler_config": {
        "type": "CrawlerRunConfig",
        "params": {
            "scraping_strategy": {
                "type": "WebScrapingStrategy",
            },
            "stream": true
        }
    }
}'

Got a 307 redirect to /crawl/stream, and the streaming worked!

Checklist:

[x] My code follows the style guidelines of this project
[ ] I have performed a self-review of my own code
[x] I have commented my code, particularly in hard-to-understand areas
[ ] I have made corresponding changes to the documentation
[x] I have added/updated unit tests that prove my fix is effective or that my feature works
[x] New and existing unit tests pass locally with my changes

May 05 '25 10:05 zinodynn