crawl4ai
crawl4ai copied to clipboard
Fix: request `/crawl` with `stream: true` issue #1066
Summary
Fixes #1066
This PR fixes the error that happens when you send a request to /crawl with stream: true in the crawler_config.
Whatβs the fix?
If a request has stream: true, it now automatically redirects (307) to the /crawl/stream endpoint.
No more async_generator errors
List of files changed and why
deploy/docker/server.py
- Added a check for stream: true in the /crawl request handler.
- If detected, redirects to /crawl/stream instead of trying to process it in the wrong place.
deploy/docker/static/playground/index.html
- update stream configuration based on endpoint selection (
/crawl:stream is False,/crawl/stream: stream is True )
tests/docker/test_server_requests.py
- add test for crawl endpoint with stream redirects
How Has This Been Tested?
Tested with curl(like the issue example) eg.
curl --location 'http://localhost:11235/crawl' \
--header 'Content-Type: application/json' \
--data '{
"urls": [
"https://example.com/page1",
"https://example.com/page2",
],
"crawler_config": {
"type": "CrawlerRunConfig",
"params": {
"scraping_strategy": {
"type": "WebScrapingStrategy",
},
"stream": true
}
}
}'
Got a 307 redirect to /crawl/stream, and the streaming worked!
Checklist:
- [x] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my own code
- [x] I have commented my code, particularly in hard-to-understand areas
- [ ] I have made corresponding changes to the documentation
- [x] I have added/updated unit tests that prove my fix is effective or that my feature works
- [x] New and existing unit tests pass locally with my changes